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SYMPOSIUM 





CLINICAL AND STATISTICAL PREDICTION REVISITED 
MARVIN W. KAHN! 
University of Colorado School of Medicine 


Meeh!®: * has stated in no uncertain terms his lack of esteem for clinical pre- 
diction and the clinical use of psychological tests. Even though there has been little 
rebuttal of his point of view, clinical evaluation (we know of no actuarial data on the 
subject) suggests that few practicing clinicians would heartily endorse such con- 
clusions of his as: “‘Putting it bluntly, it suggests that for a wide range of clinical 
problems involving personality descriptions from tests, the clinical interpreter is a 
costly middle man who might be eliminated”. ». 27), or “The several hours of 
highly skilled work sometimes expended in arriving at a dynamic formulation of the 
patient by an ingenious extrapolation of test results could very possibly be spent 
much better in added hours of psychotherapy” ©: ». #7). 

Holt “) has discussed the comparability of the studies cited by Meehl “ in sup- 
port of his conclusions as to which method, clinical or actuarial, predicts more ac- 
curately. These comparisons, Holt maintains, pitted very crude clinical prediction 
against cross-validated actuarial predictions, and hence, no fair comparison was 
made. In this respect, several studies, such as the recent investigation of accident 
repeaters by Conger et al®) have demonstrated the effectiveness of “blind” clinical 
prediction, when no amount of actuarial juggling of the data yielded discriminating 
results. 

In the early chapters of “Clinical vs. Statistical Prediction’’ Meehl® has pre- 
sented a rather thorough, if somewhat theoretical, analysis of the nature of clinical 
activity. This includes a rather convincing rebuttal to contentions, such as Sarbin’s 
(7) that the clinician is at best a second-rate actuarian. Meehl points out that in 
formulating predictions, it is necessary to decide which variables are to be considered. 
Their selection must stem from hypotheses concerning the kinds of factors that 


‘Based on a paper read as part of a Symposium entitled Clinical and Statistical Prediction, at 
the annual meeting of the Rocky Mountain chological Association in Santa Fe, New Mexico, 1958. 


Eprror1aL CoMMENT 


Although this Journal ordinarily gives first priority to actual research yet this symposium is 
being published because it deals cogently with a very important problem. Dr. Kahn makes a very 
important point when he emphasizes that clinical judgment should be confined to predicting be- 
havioral events directly related to individual personality factors for which some valid criteria have 
been established either in terms of theoretical concepts or actuarial data. It should be stressed that 
clinical judgments can be no more valid than the validity of the constructs or criteria presumably 
bane judged. If “‘schizophrenia’”’ is not a valid diagnostic entity, or if various clinical indices bear no 
valid relationship to the disorder if any actually exists as such, then clinical judgments depending 
upon such invalid constructs must also be invalid. ; 

A still more critical factor relating to the validity of clinical judgment in general concerns the 
validity of the judgments of particular clinicians. Convincing evidence is accumulating that many 
so-called clinicians are making no better than chance predictions. When evaluated objectively, only 
the judgments of a few clinicians have demonstrated validity. Research which samples the averaged 
judgments of good and bad clinicians tends to produce no better than chance predictions because the 
superior judgments of the good clinicians are balanced off by the invalid judgments of the poor. The 
crucial test of Meehl’s hypothesis is to compare the judgments of the best clinicians with the best 
actuarial predictions. He would unquestionably find that superior clinicians produce insights which no 
actuarial methods could come up with because they represent unique judgments of unique and rare 
events. 

F.C. T. 
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might be relevant to the prediction. Creative hypothesis development is seen as a 
unique skill, necessary to prediction, and something that an actuarial clerk cannot 
mechanically achieve. He acknowledges that for clinical problems a clinician is 
crucial and irreplaceable as the source of hypotheses. 

Furthermore, he notes frequency data of significant clinical meaning often can- 
not be clerically tabulated because unique individual responses of rare frequency are 
common. Because of the unique life experience of each individual, many of his be- 
havioral responses are unique to him. However, on the basis of clinical hypotheses, 
these individualistic responses can be related to broad general classes of hypotheses 
from which prediction then can be made. As a hypothesis creator, Meehl maintains, 
the clinician is no more a crystal-ball gazer than an hypothesis creator in any other 
field of science. Obviously, however, once having generated an hypothesis, no matter 
what its source, he is obligated, like other scientists, to support or reject it on empiri- 
cal grounds, if, in fact, this is possible. 

In view of his apparent awareness of the nature of clinical activity, it is surpris- 
ing that Meehl chooses the kind of research he does to compare clinical and actuarial 
prediction. It seems questionable to us whether the kind of clinical activity Meehl 
himself initially describes, was in fact employed in the clinical studies used in the 
comparisuu. In addition, one may ask what sorts of criterion variables may be 
properly predicted by this sort of clinical activity. 

A key issue appears to be the definition of the term “clinical”. The studies cited 
by Meehl seem to imply that ‘‘clinical’’ means any subjective weighting and com- 
bining of data to predict to a precise criterion. By this standard, any guess about 
anything might conceivably be called clinical. To define “clinical” as that which is 
non-actuarial is much too broad a definition to be useful, and is, of course, an easily 
toppled straw man. 

The scope of the definition must be narrowed and made more specific in order to 
be used meaningfully in any comparison of methods. One dictionary defines “clin- 
ical” as “occupied with investigation of disease in the living subject by observation, 
as distinguished from controlled experimentation.”’ As a working delineation of the 
term for psychology, this definition might be modified to mean ‘occupied with in- 
vestigation of personality in the living individual by systematic observation.”’ Limit- 
ing the scope of the definition to personality has the effect, for psychology, of limiting 
the criteria to which the clinician might reasonably be expected to predict—a crucial 
consideration in our opinion. To expect the clinical method to predict anything 
which any other method might attempt to predict is ridiculous. 

From our stated definition, clinical predictions should attempt to predict only 
criteria which are directly relevant to personality, broadly defined. Thus, we would 
include the individual’s general capacities and acquired aptitudes, his modes of 
adjusting and coping with situations, and his motivations and affects, as well as 
theoretical accounts of genetic factors concerning the etiology of this individual’s 
uniqueness. From this general definition of clinical, one can attempt to answer the 
question of the proper subject matter for clinical prediction. Thus, appropriate 
clinical prediction would be confined to behavioral events which are largely a func- 
tion of personality factors, and which can be appropriately derived from systematic 
observation. From an examination of the kinds of predictions involved in the pitting 
of actuarial against clinical methods in the Meehl studies, it will be noted that the 
criteria employed often involved complex behavior which frequently included many 
factors other than personality. This was particularly true of studies such as those 
which used success in flight-training or success as a parole risk as criterion variables. 

Even behavioral events of more direct clinical reference, such as suicide, or 
response to psychotherapy, were beset by many difficulties in terms of the multitude 
of extra-personality variables which might influence the criterion behavior. Suicide, 
for instance, may appear on the surface as a relatively clear-cut behavioral criterion 
regarding which one might reasonably make predictions from inferences about per- 
sonality characteristics derived from clinical evaluation. However, more sober re- 
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flection suggests that even suicide may be influenced by such additionai external 
factors as the amount of supervision a patient receives, the type of situational frus- 
trations he runs into, economic and social circumstances, availability and adequacy 
of psychotherapy, and so on. A clinician on the basis of knowledge of the patient’s 
personality might justifiably make a prediction concerning the overall likelihood of 
suicide in a certain individual, but environmental factors would still be an important 
limiting consideration. 

By restricting the term clinical to the subject matter of personality, one narrows 
considerably the range of variables which may reasonably be predicted. In many 
instances there may well be no accurate external criterion possible, and one will be 
thrown back upon the use of construct validity, as Cronbach and Meehl®)? point out. 
Further, the clinician is often more likely to be concerned with the relative unique- 
ness of the individual than with his similarities to other groups of individuals. While 
individual behavior can be abstracted to be grouped with broader classes of behavior, 
the clinical task is frequently concerned mainly with the individual and his unique- 
ness. 

Often the uniqueness that the clinician pursues may be obscured by the ab- 
straction of data required by an actuarial approach. For example, we might contrast 
an abstracted Rorschach response with the actual verbalization. While the scored 
response represented as ‘“‘W-M-H” tells us something of the individual, the actual 
verbalization ‘“‘Oooh, a big, burly football player coming over to crush you”’ tells the 
clinician even more. It might be argued that if we involved all the above data in 
frequency tables, we would do better. But the problem, as Meehl has pointed out, 
is that so many infrequent or unique responses occur that it would be impractical 
actuarily to deal with so many variables. 

Our impression is that the crucial comparison of the two methods remains to be 
done. When the criterion may be influenced by environmental factors, which may 
vary considerably from individual to individual, such as the many individual var- 
iables involved in successful parole, a best fit formula might reasonably be expected 
to handle both situational and personality factors and should be more accurate. 
However, where this is not so, as in the case of situations where there is little var- 
iability from individual to individual in the operation of environmental variables, or 
where the criterion itself is of such a nature as to be relatively independent of such 
variability, then knowledge of uniqueness should prove superior to a formula which 
makes the best guess in terms of the average individual. From this standpoint, 
proper clinical prediction can be made adequately only with respect to behavioral 
agree to clearly defined stimulus situations, and often is limited to prediction 
within construct validity. 

The clinical task is not, as Meehl would imply, primarily one of accurately pre- 
dicting gross response to complicated environmental situations. Rather, it is con- 
cerned primarily with two inter-related activities, the evaluative and the thera- 
peutic. The usual goal is to alter behavior in the direction of more effective function- 
ing, whether this involves revised personal constructs, self-acceptance or genital 
sexuality. 

There are, of course, many implied predictions in this sort of activity, but the 
predictions occur largely within a closed system of construct validity. The evaluative 
phase predicts certain theoretical relationships in terms of the particular system be- 
ing used. The therapeutic phase involves predicting certain activities of the patient 
and the therapist together, to lead to certain postulated behavioral changes. These 
are the basic clinical predictions. Prediction to situations of social context immed- 
iately introduces many variables which limit the power of clinical predictions. 

Evaluation conceived of as diagnostic labeling has clearly limited usefulness. 
But, as an assessment of the patient’s total personality, including abilities, conflicts 
and dynamics, evaluation gives valuable leads as to the optimal type of treatment, 
and the kinds of problems and defenses that will have to be dealt with, whether the 
treatment involves insight, support or environmental manipulation. Such evaluation 
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is an integral first step in treatment and if properly used can be an effective aid. 
Alexander“: »- !® has discussed the importance of having a good over-view of a 
patient before launching into treatment. He states that ‘‘the analyst during this 
period may be compared to a traveler standing on top of a hill overlooking the 
country through which he is about to journey. At this time it might be possible for 
him to see his whole anticipated journey and perspectives. When once he has 
descended into the valley, this perspective must be retained in his memory or else it 
will be gone. From this time on, he will be able to examine small parts of this land- 
scape in much greater detail than was possible when he was viewing them from a 
distance, but the broad relations will no longer be so clear.” 

The process of psychotherapy does not seem to us to be essentially different 
from the process of evaluation, in that both make inferences from the patient’s be- 
havior about such things as his conflicts, dynamics, defenses, etc. The basic differ- 
ence is that treatment aims to help the patient to change, while evaluation is pri- 
marily concerned with assessing the patient and his potential for change. In both 
instances, however, the clinician must assess and form hypotheses about behavior. 
The more thoroughly the therapist understands the patient the better he can eval- 
uate each segment of therapy behavior and be of more efficient help. Only after 
evaluation can the therapist determine treatment goals and the appropriate manner 
of dealing with each segment of behavior. Evaluation in terms of the theoretical 
system being used aids the therapist, for instance, to determine when to reflect, 
interpret or perhaps reassure. Adequate evaluation of the patient is necessary before 
starting therapy, as well as for dealing with each response during therapy. 

Since we see evaluation and treatment as involving similar clinical skills and 
basic aims, we cannot share Meehl’s®: . *7) view that on the one hand evaluation is 
useless, while, on the other hand, therapy is to be highly respected. The empirical! 
validity of the effectiveness of neither is clearly established, and construct validity 
would appear in many instances to be the only way to test either at present. But 


they are part and parcel of the same process and their worth cannot be meaningfully 
dichotomized. We feel that Meehl’s®: *) conclusions are premature and imply sterility 
in an area where, in fact, much fertility may lie. 
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CAN THE COMPUTER SUPPLANT THE CLINICIAN?! 


WAYNE H. HOLTZMAN 
The University of Texas 


Twenty years ago, much of what is commonplace today in the computer world 
would have been laughed off as science fiction, surely beyond attainment within 
the foreseeable future. And yet today we have a number of commercially available 
computers which can operate at such amazing speeds that a symmetric, 80-by-80 
matrix can be inverted in 75 seconds, a tedious task of almost impossible magnitude 
before the advent of high-speed computers. Of course, most such computers are too 
busy for routine application in psychology, but the opportunity is there for those 
with initiative, a little imagination, and money. 

The psychologist who reads about these fantastic machines cannot help but 
wonder how they will affect his own activities. Is it conceivable that hardware 
advances in computers and new schemes for programming the machines will produce 
some kind of super-robot which has the capacity to solve problems, to invent new 
ideas, to replace the human brain in many operations? Admittedly, a machine 
can greatly surpass the human being right now in arithmetic computations, and 
even in certain logical operations. But surely the area of creative thinking is sacro- 
sanct to man, at least in the eyes of individuals not intimately acquainted with 
recent advances in this rapidly growing field. No one is foolhardy enough to claim 
that existing machines can seriously compete with the human brain in most per- 
ceptual and cognitive activities. It has been said, for example, that even such a 
a simple task as looking up a number in a phone book would require the equivalent 
of nine, high speed, IBM 704 computers to do the job in the short time that it takes 
a human being’). Nevertheless, recent work in the machine simulation of human 
problem solving, notably that of Newell, Shaw, and Simon ©: * has already made 
some significant progress in the construction of machines that can think. 

What are the immediate implications of these recent advances for the clinician? 
Is it likely that the computer will supplant the clinician in certain activities? In 
searching for an answer to this question, let us first examine in detail what a clinical 
psychologist does, both in his everyday, practical activities and in his role as re- 
search scientist. 

Whether in a research or service setting, the unique quality that the clinician 
has to offer is himself. Trained to focus upon the steady stream of interaction be- 
tween himself and the person with whom he is working, the clinician constantly 
revises his image of the individual’s personality, adapting his behavior to that of 
his client, reasoning intuitively without full awareness of his own thought processes, 
gradually sharpening his concept of the person with whom he is interacting — in 
short, using himself as the primary instrument for collecting, processing, and inter- 
preting information about his client. This intuitive, interactive process is the method 
par excellence of the skilled clinician, whether he is engaged in psychotherapy or is 
concerned with the evaluation of personality for some other purpose. It is difficult 
to imagine how a machine could replace a clinician in his primary role of a person 
interacting with another person. Surely in this respect the clinician is safe from any 
job displacement by machines. 

Granted that the unique quality of the clinician is his face-to-face role inter- 
acting with the client, what are some of the other functions performed daily by the 
clinician where the introduction of a machine might supplant certain activities? 
Aside from counseling or psychotherapy, the main role of a clinician is diagnosis 
and evaluation of the individual and his personality. As a diagnostician, the clinical 
psychologist’s activity can be separated into three different phases: (a) the collection 
of information about the person; (b) the preparation and translation of this infor- 


‘Paper read at symposium on the impact of computers on psychological research, held at the 
annual meetings of the ina rican Psychological Association, September 7, 1959, Cincinnati, Ohio. 
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mation for analysis; and (c) the interpretation of this information. All three phases 
occur simultaneously when the clinician is using himself as an instrument in face- 
to-face contact with the client. In most settings, however, the clinician spends a 
major portion of his time and energy focussing upon one of these three phases at 
a time. 

Let us examine each of these three aspects of the diagnostic process one at a 
time with respect to three questions: What is the nature of the clinical activity? 
Is it theoretically feasible for a machine to take over all or part of the clinician’s 
function? If theoretically feasible, how practical is the machine in economic com- 
petition with the human being? 


Clinician Versus Machine as a Collector 


The collection of information in clinical diagnosis runs all the way from self- 
report forms and tests that can be given by a clerk, to depth interviews and pro- 
jective techniques that require the clinician-subject interaction. While it is conceiv- 
able that such routine information as simple biographical items or responses to 
objective-type tests such as the MMPI could be obtained by completely automatic 
methods once the subject is introduced to the machine, it will probably remain 
impractical from an economic standpoint for some time to come. Clerks and low- 
level technicians are easy to come by. 

As one moves through the range of possibilities to the interview and projective 
techniques, the machine is not only impractical from an economic point of view, 
but it is also theoretically unfeasible because of the interpersonal nature of the 
interview or testing situations. While there are many ways in which the clinician 
can utilize clerks and technicians to assist him in his role as a collector of information, 
it is highly unlikely that machines will play an important part in this initial phase 
of the diagnostic process. 


Clinician Versus Machine as a Processor 


A good deal of the clinician’s time is spent in the scoring, coding, and analysis 
of information to prepare it for interpretation and evaluation. This processing of 
information ranges from the routine counting and scoring of items in a test to the 
more intuitive analysis of interview and projective materials. From the clinician’s 
point of view, the diagnosis of personality calls for an intensive qualitative analysis 
of many subtleties within test protocols, as well as the compiling of more objective 
scores and signs from the available information. If the clinician is viewed as a 
free-floating processor with no hard and fast rules, if he typically generates his 
categories of description and analysis on an ad hoc basis as he builds his picture of 
the unique personality before him, and if he is generally unaware of his own thought 
processes during the analysis, then no machine can conceivably take over this 
function as a processor of information. 

But even granting the clinician his right to use the information of his choice 
and to make the analysis of his choice, the high speed computer can still be a useful 
servant in processing some of the information for the clinician. I daresay that 
every psychodiagnostician spends considerably more time than he cares to admit, 
in processing his raw data by means that are really better adapted for a machine 
than for the human brain. To illustrate my point, let me cite some uses to which 
we have put computers in the processing of information in the responses to a new 
inkblot test developed in our laboratories. 

The Holtzman Inkblot Test consists of two parallel forms, each containing 
45 inkblots to which the subject is asked to give only one response @). These two 
major changes in the Rorschach procedure, lengthening the test and instructing 
the subject to give only one response per card, make it possible to process the data 
along psychometric lines without sacrificing the rich, projective quality of the 
testing procedure. At present 22 variables are scored for each of the 45 cards in 
the test. These variables cover a wide range of concepts varying from Location, 
Form Appropriateness, Color, and Movement, to such content variables as Anxiety, 
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Hostility, and Anatomy. As in many other clinical tests, the scoring of the verbal 
record is the first step in processing the data. While it may be theoretically possible 
for a large computer to score most of the raw data, the cost would be completely 
prohibitive. At this first stage in analysis, the clinician or, better still, his trained 
assistant can run rings around the best computers. But once this primary codifying 
is completed, the situation is reversed. 

The scores are entered as digits in boxes following each response on the printed 
test booklet. The scored protocol is given to a key-punch operator who punches 
one IBM card for each response, yielding a total of 45 cards. The cards are fed to 
an IBM tabulator which punches out a summary card containing total scores for 
the 22 variables as well as miscellaneous identifying information. Of course this 
particular job could be done faster by an electronic computer, but at this stage in 
processing, the cheaper, more accessible tabulator is highly satisfactory. If a simple 
summation of the individual scores completed the information-processing phase 
of the analysis, the machine would have only a slight edge on the human being, 
and then in terms of speed and accuracy. Certainly the machine would be imprac- 
tical for everyday use. 

But suppose after several years of research, a great deal is learned about the 
test that has diagnostic value. And suppose further that this acquired information 
deals with many complex configurations of scoring elements, as well as their simple 
summation into single, isolated scores. Isn’t this focus upon multiple signs and 
complex configurations precisely the approach that most clinicians believe consti- 
tutes one of their important, unique contributions as processors? 

Once the basic scoring elements are punched on cards, any conceivable config- 
uration of the scores can be computed by machine with perfect accuracy and high 
speed. A large number of such response patterns can be scored simultaneously and 
either stored in the computer’s memory for further analysis or printed out for use 
by the interpreter. If enough research has been completed on the test to indicate 
what kinds of configural scores are diagnostically significant, the computer can go 
one step further. Prior to running the response data through the computer, a diag- 
nostic reference table containing all the significant information from available 
research is stored in the computer’s memory. As the basic scoring elements are fed 
to the computer, it generates signs and configural scores, scans its memory for 
relevant information in the diagnostic reference table, assigns appropriate weights 
to the significant scores, and prints out a summary for further interpretation by 
the clinician. 

Granted that it is theoretically feasible for the computer to handle this import- 
ant kind of problem, how efficiently can the task be done by machine as compared 
to the human brain? Once the psychologist has specified the nature of the desired 
signs and configural scores so that an adequate computer program can be written, 
the machine should easily outstrip the human brain in both speed and accuracy. 
Recently, my assistants, Edward Moseley and Gary McCollough, wrote a 650 general 
purpose program for configural scoring. Any conceivable pattern involving any 
number of variables up to a total of 22 can be recognized and counted. Fifty patterns 
can be scored simultaneously in one sweep of the cards through the 650 computer. 
At present the method is unduly slow because of its general nature and lack of 
optimal programming, taking several seconds to process one card. With a little 
more work, it sho. . be possible to reduce the time for configural analysis to the 
read-in time of the data cards, about three responses a second. Can you imagine 
yourself as a clinician counting simultaneously fifty complex patterns involving a 
number of variables in each of 45 responses to an inkblot test? Obviously the machine 
has a great advantage over the human being for this type of information-processing. 


Clinician Versus Machine as an Interpreter 


Closely following on the heels of information-processing is the problem of 
interpretation. This third and final phase of the psychodiagnostic process consists 
of one or more of the following — description and classification; prediction of future 
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behavior in either a general or specific sense; and post-diction of past behavior or 
events. The interpretation may be in a simple form such as, “He is a schizophrenic 
individual’. Or it may consist of highly complex, contingent predictions such as, 
“His severe anxieties will diminish if he consults a female psychotherapist, if his 
wife doesn’t leave him, and if he continues to hold his job.’’ The nature of the 
interpretations may range from an actuarially-based decision to an extensive clinica! 
evaluation. Obviously for some kinds of interpretation the machine is completely 
out of place. For others, however, it may prove a highly useful adjunct to the 
clinician. 

Most of you are aware of the debate that has been raging the past several 
years over the relative merits of actuarial methods versus clinical methods in psycho- 
diagnosis or the prediction of behavior. Somewhat the same kinds of arguments 
can be applied in considering the clinician and the computer. If you side with Paul 
Meehl “? regarding the value of a cook-book method as opposed to rule of-thumb, 
then obviously the computer has an important future in the interpretation as well 
as the processing of information. Base rate information can be stored in memory 
for various reference groups of diagnostic interest, together with probability values 
and cutting points for making a decision. If you side with Holt ®) or McArthur“), 
the machine will serve only a minor role, if any, as an interpreter, since the clinician 
is completely free to over-rule any of the evidence in favor of his own judgment. 
In either case, as yet the computer has not proven economically practical as an 
interpreter. 

But there is no reason why it cannot be made highly practical in the future, 
even with existing computer hardware, if anyone cares to work out the necessary 
programs for this purpose. The truth of the matter is that there are mighty few 
psychodiagnostic situations in which enough is known of the factors influencing 
the outcome to provide the parameters necessary in the computer’s memory to make 
intelligent interpretations. The same problem of validity plagues the clinician, 
but at least he can lean back on his subjective experiences and intuition, both of 
which are conspicuously lacking in the computer except in a trivial sense. 

Let us return to our original question. Can the computer supplant the clinician? 
When one reviews the many functions of a clinical psychologist in his role as psycho- 
diagnostician, it is obvious that the answer is, ‘‘Yes, but only in part.”’ As a collector 
of information, the computer has little to offer that cannot be done better by the 
clinician or his assistant. As a processor of information, machines can greatly surpass 
the human brain once the primary coding of information has been done by the 
clinician or his assistant. As an interpreter of information, once again the clinician 
has a definite edge over the computer, at least until much more research has been 
undertaken which gives us the rules of interpretation and the essential parameters 
for a valid decision. As Meehl has said “: ».2), “The clinician... acts as an 
inefficient computer, but that is better than a computer with certain major rules 
completely left out (simply) because we can’t build them in until. we have learned 
how to formulate them.” 


REFERENCES 


1. Hox, R.R. Clinical and statistical prediction: A reformulation and some new data. J. abnorm, 
soc. Psychol., 1958, 56, 1-12. 

2. HowraMan, W. H. Objective scoring of projective tests. In Bass,I. & Berg, B. M. (eds.) 
Objective approaches to personality assessment. Princeton, N. Y.: D. Van Nostrand Company, 1959. 
Pp. 119-145. 

3. PMcARTHUR, C. Clinical versus actuarial prediction. In Proceedings, 1955 invitational conference 
on testing problems. Princeton, N. Y.: Educational Testing Service, 1956. Pp. 99-106. 

4. Muest, P. BE. Wanted — a good cookbook. Amer. Psychol., 1956, 11, 268-272. 

5. Newett, A., SHaw, J. C. and Smon, H. A. Elements of a theory of human problem solving. 
Psychol. Rev., 858, 65, 151-166. 

6. Newest, A SHAW, J.C. and Son, H. A. The processes of creative thinking. Santa Monica, 
California: The Rand Corporation, Revised January 28, 1959 

7. Unr, L. Latest methods for the conception and education of intelligent machines. Behav. Sci., 
1959, 4, 248-251. 





SOME ASPECTS OF CLINICAL JUDGMENT! 
FRANKLYN N. ARNHOFF 


Mental Health Research Unit 
New York State Department of Mental Hygiene 
Syracuse, New York 


Recently there has been an increasing selfconsciousness on the part of the 
clinical psychologist as to how well he can do his job, with the spectre of Paul Meehl 
and associates looming in the background, checking him against the endless regurgi- 
tations of a monster machine, which if it can’t do better can apparently, all too often, 
do as well. Whether or not the clinician awakens at night in a cold sweat from this 
Orwellian nightmare is related to a great degree to the clinician and his judgmental 
processes. 

The question of whose judgment is best, the clinician or a machine, is a pseudo- 
issue which has attained the proportions of a paranoid persecutory delusion for some 
because of affective involvement with the wrong issues. In defending against what 
some clinicians have felt to be a challenge to their birthright, only one minor segment 
of the total problem has been focused upon—the question of who, or better yet, 
what can predict or judge best. Lost in the ensuing maze of defense mechanisms and 
intellectual polemics are the major questions which confront us; the issues of pre- 
diction of what, and for what, and the confusion between explanation and pre- 
diction. These issues are basic to the definitions of psychology as a science, and the 
role and functions of the individual practitioner, in this case the clinician. 

It is unfortunate that the machine model has become so all pervasive, since it 
has taken on the role of model for human behavior and has quite subtly influenced 
the direction and function of much of clinical psychological research and thinking. 
To attribute this trend to the publication of ‘‘Clinical vs. Statistical Prediction” ™, 
or to place any individual in the role of the villain, is quite erroneous, since while 
Meehl and others focused upon and highlighted a problem, its history is old and its 
ramifications great. 

In its short developmental history, clinical psychology has become relatively 
insulated and impervious to the mainstream cf psychological thought, theory and 
empirical findings, especially with regard to the study of clinical judgment, which for 
the most part, has progressed as a study of end products, irrespective of the body of 
theory and empirical findings established by decades of research in the general area 
of judgment. Consequently, far too many of the clinical judgmental studies have 
had to rediscover what has long been known, namely, that by dint of training and 
experience the laws of human behavior are not abrogated, and that the established 
factors influencing human judgment also apply to clinical judgment, and therefore, 
to the clinician as a functioning person. But what of these factors, and how do they 
relate to the question of man vs. machine? 

Let us first deal with the term “judgment,” to place it in a clear context to per- 
mit. better understanding of what we are dealing with and to open the door for the 
discussion of its relevancies. Johnson “*: *) defines judgment as one aspect of think- 
ing, placing it in the general area of problem solving. Judgment is not productive, as 
it consists of the evaluation or categorization of objects of thought. Judgment so 
defined becomes an end point or product—the result of a chain of psychological 
operations of perceiving bits of data, coding, integrating, relating, weighing, etc., 
from which the jvdge not only decides upon the final category or categories of judg- 
ment, but also most often, makes the decision as to what categories are even to be 
considered. Consequently, at each stage of the process all the weaknesses and 
strengths of differential abilities and inter-individual differences are operant, and all 
the factors and variables related to the thought processes become pertinent. 


1Paper read at the American Psychological Association Meeting, September 4, 1959, as part of 
the symposium “Clinical Skills Revisited.” 
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From the study of complex abstract judgments, Johnson “*: *) abstracted some 
general principles which have particular relevance to a discussion of clinical judg- 
ment. 

1. The judge may not be able to point to the factors by which his judg- 
ments are determined. 


2. The judgment may be determined by an independent variable of which 
the subject is not aware, even when his attention is directed to it. 

3. If the judgment called for is difficult, judgment in terms of criteria 
other than the one assumed relevant is likely. 

4. When an abstract judgment is called for, an affective judgment is 
commonly given. 


5. Judgmental confidence increases as a function of distance from a 
category threshold. 


The most important characteristic of these general principles is the underlying 
emphasis on the process of judgment rather than on content or product per se, and 
it is the study of process that holds the promise of understanding of the clinical 
judgmental situation. Recently, Taguiri®® stated that “Recent approaches to the 
problems of social perception with few exceptions have been empirical, highly 
operational, often guided by rather simple hypotheses and with emphasis upon 
accuracy.”’ He goes on to argue that what is needed is a shift from concern with 
correlations between products of social perception to examinations of the process. 

The first link in the process that is the clinical judgmental situation is not the 
client or patient, but the clinician, his values, and his conceptualization of his role 
and function. From this evolves his situational goals, his choice of tools, and in the 
judgmental research situation, his choice of tasks and their means of measurement 
by which he may attempt to evaluate clinical judgmental performance. It is at this 
level that I believe the question of man vs. machine or formula rears its head, since 
a restricted conceptualization of the applied clinical role may make it inevitable that 
the comparison be made between man and machine. 

It is often said that the aim of psychology is to control and predict behavior and 
the psychologist hastens to add that since he is a scientist, and scientists do not let 
values intrude upon science, his interest in control and prediction are valueless. 
Furthermore, as Criswell®) recently succinctly explained, the physical scientists 
have been held up as the scientific model for us to follow, with the psychologist’s 
fantasies fondly toying with the dream that someday we will have a machine model 
which will explain and fit human behavior which will not only please the older, more 
established scientific groups, but will win us the recognition of | being true scientists. 
As a consequence, the types of studies he evolves, and we deal here with clinical 
judgmental studies, are truly loaded in the sense that by the very nature of the tasks 
they set up, the situations they examine and their emphasis upon either end products 
or the correlations between end products, they demonstrate the underlying self- 
concept and the attendant attitudes and values of the clinician: imperfect computors 
Holerith machines, and formula applicators which will be replaced in the world of 
tomorrow by a perfect machine that will handle an infinite number of rigidly quanti- 
fied variables, and will predict the unfolding of human behavior, even for the in- 
dividual case. If a formula can be devised, a machine perfected, and of course, the 
necessary grants obtained, it could all be fed into the machine and tomorrow would 
follow from what we know today. Only then will we be true scientists. Implicit in 
this self-concept is a philosophy regarding the uses of past knowledge which the 
psychologist in question may or may not ever consciously face; what is the function 
of past knowledge, in this instance the formulas we have devised, or the data we have 
fed into the machine? Is it to be used to predict tomorrow, or is it to be used to 
construct tomorrow? 

Depending upon the position one takes on basic issues such as these, the manner 
in which one sets out to investigate clinical judgment will unfold; whether he asks his 
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colleagues to predict the kinds of behavior children would condone“, to predict 
specific MMPI profiles or to judge personality characteristics as manifest on some 
test or another (see Taft “*) for an extended review of such studies) or, whether he 
will attempt to investigate the interactional and transactional processes that are 
operant in the judgmental situation, be it social perception, psychotherapy, or a 
laboratory study on stress or anxiety, situations in which, at our stage of knowledge, 
formulas are yet to be developed and a machine could only rearrange what is already 
known. It is sometimes forgotten that to program a machine, information must 
already be known, and to maximize its use, the known information has previously 
been found relevant to the task at hand. In his recent paper on Clinical and Statisti- 
cal Prediction, Holt“ has given considerable emphasis to the matter of relevance of 
data, and we shall return to it again later. 

Returning to clinical judgment and the study of judgment proper, it is safe to 
echo McArthurs’ “*) comments that there are no adequate studies which demonstrate 
what clinical prediction, and hence clinical judgment, can do. However, since we 
have maintained that knowledge and progress in the study of clinical judgment 
depend upon analysis and demonstration of the process itself, let us turn then to 
some of the factors which are operant in the judgmental situation. From the main- 
stream of psychological knowledge, studies in classical psychophysics and social 
psychology, it has been well established that context effects and the operation of 
judgmental anchors, ™) personality and attitudinal characteristics of the per- 
ceiver, © 17, 21) and social class factors and biases®: *’, as well as one’s system of 
values, to mention only a few relevant variables, not only contribute to the total 
situational variance, buf are intimately operant as basic to the understanding of the 
process that results in judgment. While it is of course quite possible to control these 
by experimental design, we thereby lose the information that the study of them can 
produce. Regardless of whether one tends to view their operation as nuisances to be 
contended with, or as strengths or weaknesses, they are the psychological concomit- 
ants of dealing with people, be they the judge or the judged. In any situation, they 
are the givens; the conditions, axioms, and assumptions, that set the stage and are 
the determinants of the chain of thought which eventuates in a judgment. In any 
one judge, at any one time, the effects of these often subtle factors are usually not 
conscious, and the judge may not accept them as operant even when they are called 
to his attention by another party. It would seem self-evident that maximal under- 
standing can only accrue by accepting these variables as operant in the clinician, 
and by investigating their relationships, functions, and interactions. 

On an even more primary level, intellectual factors are quite basic to any dis- 
cussion of thinking, despite the fact that their lack of consideration in much of the 
research on thinking would lead one to think otherwise. While it is probably true 
that attainment of a Ph.D. or an M.D. requires more than what is usually thought 
of as average intelligence, even for a group such as this individual differences in the 
components of intelligence are quite great. I refer here to factors such as awareness 
that a problem exists, ability to deal with abstract concepts, analyze and synthesize, 
perceive similarities, memory, etc., factors which we see as important in our patients 
and students, but fail to see as being relevant to research and understanding of the 
clinical judgmental process, or how they relate to the success or failure of the clin- 
ician in attaining his goals. The clinician’s dealing with the unique case depends to 
some extent upon memory ability, since to make a decision of “unique” means that 
a comparison has been made with all that has gone before, at least in his own exper- 
ience. What is not available to memory, is not available for comparison, and the 
judgmental product must, therefore, suffer. 

I have mentioned the matter of individual differences not only because they are 
logically important, but because in any study of clinical judgment, it soon becomes 
apparent that considerable differences exist between judges, differences which are 
attributed to a host of factors, not the least of which is clinical experience, although 
there is usually some attempt to take this into consideration in experimental studies 
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by “‘equating”’ judges in terms of the time dimension. Interesting enough, however, 
when the matter of clinical experience has been examined, albeit often in terms of 
the type of loaded studies mentioned previously, rather than the more experienced 
clearly shining in their performances, it has been difficult to demonstrate its relevance 
to the task. In his review of the literature on “Ability to Judge People,” Taft “* 
concluded that one of the factors which fairly consistently failed to correlate with 
ability to judge people, was the factor of clinical experience. Admitting the in- 
adequacy of many studies, the unfair comparisons in terms of the machine or actuar- 
ial methods discussed more adequately by Holt“), we still are not able to dispel the 
feelings of inadequacy which arise from our poor showing as judges and predictors 
of people. In many respects, this is paradoxical since as experts in human behavior, 
and with considerable experience in dealing with people, we are unable to demon- 
strate professionally with any consistency what we all demonstrate in our every day 
lives; the fact that the relative smoothness of everyday life reflects an ability on our 
part to be in some degree aware, and to judge correctly, the needs, feelings, wants 
and motives of others®®. It seems logical to conclude, that in many respects, the 
aspects of judgment which we utilize with varying degrees of success in our daily 
living are different from, or not included in our professional research role, and cer- 
tainly do not relate to the situations we set ourselves to demonstrate clinical judg- 
ment. Can it be that our constructs and operations are divorced from life? Or our 
theories divorced from practice? It is obvious that the matter of “‘experience”’ tells 
us nothing about a judge and that consideration of time alone, just as using only 
this as the criterion of aging, forces us to deal with an artifact which tends to obscure 
more basic process variables. Whether or not a clinician with 12 years in the field 
is someone with twelve years of experience in growth and understanding in terms of 
variables found relevant to clinical tasks, or someone with 1 year of some sort of 
experience which has been repeated 12 times, certainly cannot be determined by 
considering only total experience as a function of time. 

Although the distinction between relative and absolute judgments has long been 
maintained in classical psychophysics as a methodological convenience, in reality, 
as we have seen, all judgments are relative, if only to our own prior experience. If 
one’s experience has been with a partial range of stimuli (in this context, let us say 
severe psychotics) the judgment of a stimulus which is accepted as representing a 
less extreme position on the scale (say adolescent maladjustment) can be expected 
to be different for this clinician, as contrasted to one whose experience has been with 
a complete range of stimuli or with a less extreme range. Hence, experience too can 
only be a meaningful variable if it is considered in terms of its theoretical and logical 
relevance to the judgmental task. 

The question of the relevance of the data which serve as the building blocks in 
the thinking and judgmental process has been imbedded in the preceding discussion. 
From the literature on clinical judgment one is led to conclude that many of the 
constructs used in formulating judgments and /or predictions, have been irrelevant 
to the task at hand, since they have resulted in such low order success. While 
Holt “®) has mentioned the need for prior determination of the relevance of data for 
maximal utilization in the judgmental situation, I should like to pursue the relevance 
issue a bit further. Barzun“ has recently made a major point of what he considers to 
be the anti-intellectualism of science, of which only one aspect is the erection of dis- 
tinct barriers between disparate scientific groups which results not only in poor com- 
munication between groups, but also the tendency to look upon one’s own area as the 
root of all knowledge rather than as a link in man’s common search for knowledge. 

In the context of the present discussion, the failure to consider construct sys- 
tems and explanatory principles other than our own follows directly from this, since 
the relevance of other bodies of knowledge, and hence their established pertinance 
to many of our judgmental and predictive tasks goes unknown. In the now famous 
Michigan projects “*) on selection for training in clinical psychology, as well as in the 
Menninger psychiatric selection projects,“ the addition of increasing amounts of 
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data either failed to increase the validity and utility of the judgments, or raised 
them only slightly. The additional data was, for the most part, still within a limited 
conceptual framework. While a variety of explanations have been propounded, the 
question of the relevance and range of the data is important. It is not only possible, 
but highly probable that many of the variables which we attempt to take into con- 
sideration are irrelevant, artifactual and /or behavioral manifestations of fewer, more 
basic determinants, so that fewer, well established variables may have far greater 
explanatory generality than is commonly expected “), 

On the other hand, a broader base of operations in terms of other construct 
systems may prove far more relevant. As Holt“ indicated, ““No matter how re- 
markable clinical judgment may sometimes be, it can never create information where 
there is none,”’ so that we must differentiate between what is data about a person or 
situation, and what is information that is relevant to the particular situational 
judgmental goals. I mentioned earlier the possibility of our constructs being divorced 
from life, or our theories divorced from practice, and I believe there is much to 
support this such as recent observations“: **) that predictions of post-hospital ad- 
justment can often better be made by patients than by nurses, and by nurses better 
than by professional staff. The explanations given were in terms of sociological con- 
structs; the normative nature of behavior, and Parsons’ concept of affective symbol- 
ism. In this light, it has been found that different groups utilize different norms and 
value systems in accepting and rejecting a person and his deviancies. Taguiri“ has 
also observed that appropriate behavior so often depends not upon the idiosyncratic 
characteristics of a person, but rather upon his role. The problem then of what is 
relevant to clinical judgment brings us again not only to the parent body of psycho- 
logical knowledge, but also to awareness and utilization of a broad base of knowledge 
from a variety of sources to provide the process of thinking with appropriate data 
which are relevant to human social functioning. While the lesson has been a long 
time in learning, explaining and judging human behavior in terms of a limited range 


of psychological dynamisms particularly emphasizing psychopathology, has only 
limited utility, unless their relevance to other construct systems and other knowledge 
has been demonstrated. 
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CLINICAL JUDGMENT: A CLINICIAN’S VIEWPOINT 
FREDERICK C. THORNE 
Brandon, Vermont 


Problems of clinical judgment have been brought into sharp focus by an 
increasing body of research tending to prove that it is not what it has been cracked 
up to be. Under sharp attack by experimentalists and statisticians whose work 
seems to show that not only are clinical judgments not even reasonably valid but 
that mechanical systems and computer results are able to do the job significantly 
better, the clinician is hardpressed to defend himself. This paper represents an 
attempt to consider the issues from the viewpoint of the practicing clinician who 
presumably understands why he is doing things better than any other personnel 
can do. In thinking over the various issues, it has seemed desirable to emphasize 
some basic conclusions in the form of a series of postulates which will help to struc- 
ture the problems with which we are dealing. These postulates are offered as 
reference points which any study of clinical judgment should take into consideration. 


Clinical judgment is operationally defined as involving the ability to make 
good (sound) decisions after gathering and evaluating all pertinent evidence, weigh- 
ing possible alternatives in terms of past experience or normative probabilities, 
and arriving at problem solutions which reflect basic science orientations (the 
cultural value system against which scientists operate). Practically, it is admitted 
that the clinician rarely will have available all the facts pertinent to a decision or 
solution but must make reasonable assumptions or intelligent guesses from in- 
complete evidence concerning what the most likely of several possible occurrences 
might be. Although it has often been identified as such, clinical judgment involves 
more than just the mechanical (cookbook) application of currently fashionable 
theories to case materials. Clinical judgment consists in making a series of educated 
guesses (in the large number of situations where definitive facts are unavailable), 
resolving inconsistencies as the data collection process continues, reflecting self- 
critically on possible sources of error, and later refining the process to eliminate 
repetitious sources of error. 


PostuLaTE I. AT ITS CURRENT STAGE OF EVOLUTIONARY DEVELOPMENT, BASIC 
SCIENCE PSYCHOLOGY HAS NOT PROGRESSED TO THE POINT WHERE IT CAN PRO- 
VIDE VALID SOLUTIONS TO MANY URGENT CLINICAL PROBLEMS 
It must be admitted emphatically that there are no exact scientific answers 

to many pressing clinical problems. In most instances, clinical judgment is at 

best only scientifically oriented in the sense that the clinical judgment is being 
made by a clinician who is relatively well trained scientifically but who may not 
have any valid scientific information to apply to any particular situation. 

Clinical judgments should be labelled clearly according to the degrees which 
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they are founded on validated methods or facts, or reflect the most competent 
“clinical opinion’’, or are merely ‘“‘educated guesses’. Each clinician should be very 
realistic as to the bases of his clinical decisions, should communicate the bases of 
his decisions, and should not claim to be using scientific methods unless this is the 
case in fact. 

In spite of the contemporary respectability of anything connected with science 
as a source of authority and justification for decision-making, let us be very realistic 
in admitting that current clinical practice is an Art, at best supported by a rudi- 
mentary skeleton of scientific theory and information. Let us recognize not only 
the limitations implied in the rather low reliabilities of many current tests and 
methods, but also the fact that many current tests are able to discriminate between 
groups as a whole but have very little predictive value when applied to individuals. 
In a field where pathognomonic indicators are virtually nonexistent in terms of 
the valid applications of tests and measurements, we need to be very frank in ad- 
mitting that most of our judgments are based on other grounds besides scientifically 
proven facts. In our own clinical practice, we often come to the end of the day 
without ever being able to claim that anything we have done was based on estab- 
lished scientific facts. 

All this means that the clinical judgment can claim to be superior to common 
sense or pure chance only in relation to problems where a valid application of 
scientific fact is possible. 


PostuLaTE II. BEHAVIOR IS SO COMPLEXLY DETERMINED, BOTH IN TERMS OF THE 
DEVELOPMENT OF AN INDIVIDUAL PERSONALITY AND IN TERMS OF THE INFINITE 
VARIETY OF ENVIRONMENTAL SITUATIONS, SO THAT EACH SPECIFIC PERSON- 
MEETING-HIS-ENVIRONMENT-AT-ANY-TIME-AND-PLACE REPRESENTS AN UNIQUE 
ETIOLOGIC EQUATION WHICH CAN BE STUDIED AND MANIPULATED ONLY IN TERMS 
OF ITS OWN INDIVIDUAL CHARACTERISTICS. 

Every clinical situation is characterized, essentially, by its ‘one of a kind” 
nature. Attempts to classify clinical situations in terms of standard types, factors, 
theories or etiologic equations can be valid only to the degree that a genuine homo- 
geneity of common elements can actually be demonstrated. The complexity of 
human behavior causation is so infinite that it has not yet been described on theo- 
retical levels much less been programmed for computer analysis. 

At current stages of basic science psychological knowledge, only the most 
eclectically-oriented can even conceptualize (much less program computer systems) 
the multi-leveled clinical approach necessary to deal with all etiologic factors. 
At a time when we are only vaguely beginning to realize the complexity of possible 
etiologic equations, when clinical methods are not yet available for studying many 
levels of organization of personality integration, when we have only the slightest 
intimations of how the brain functions, it is premature to even consider valid any 
mechanical system of personality analysis as the basis for clinical decision making. 

Computer systems can be no more valid than the validity of the program 
which is fed into them. Until an eclectically-oriented integration of all pertinent 
theoretical and factual approach to personality study on all levels of integration is 
achieved, it is premature to consider that any computer system based on incomplete 
evidence can solve the issues any more than can invalid clinical judgments based 
on incomplete evidence. 

The art of clinical judgment becomes most evident in situations such as psycho- 
therapy where, in its infinite variations, the therapist’s responses may involve 
hundreds of clinical decisions within one therapeutic hour. The number of such 
decisions may be cut down in passive psychotherapy where the therapist just listens, 
or limited in nondirective therapy where the therapist just responds to feelings, 
but are almost infinite in eclectic therapy. Similarly, clinical judgments in diagnosis 
are simplest where there is a choice between only two or few alternatives, but become 
very complex when the choice is between many alternatives as in the question 
of whom to marry. 
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Postu.taTE III. THE VALIDITY OF A CLINICAL DECISION CAN ONLY BE AS SOUND AS 
THE VALIDITY OF THE FACTORS UPON WHICH IT IS BASED. 


The conscientious clinician must recognize that clinical judgments based on 
existing psychological, psychiatric and psychoanalytic theory fail more often than 
they succeed. In spite of all the studies of the nature of scientific methods, of our 
insistence upon the objectification and quantification of clinical judgments by 
suitable operational methods and semantic refinements, and of the requirement 
that every clinician should subject his own judgments to scientific study and 
validation, we are confronted by an astounding lack of criticality in the spoken and 
published words of our clinical colleagues. We are working in an era when the 
contributions and limitations of Behaviorism, instinct psychology, Gestalt psychol- 
ogy or psychoanalysis, are only tentatively becoming established. Every school and 
system possesses a cult of enthusiastic disciples and proponents all of whom are 
gambling the validity of their own clinical judgments on their faith in the validity 
of the underlying theoretical system. The faith shown by such innocents is indeed 
a reflection on their own scientific credulity and lack of breadth of experience with 
all systems eclectically. 

Against the background of recent developments in humanism and existential 
psychology, there may be discerned a tendency to glorify the speculative dialectics 
of the “‘original’’ thinker who seems to need to magnify his own existential value 
by concocting a ‘‘bold”’ new theory which shakes off the manacles of conventionality 
in striking out daringly into metaphysical realms. Thus we have from certain 
quarters a persistent demand for more journalistic outlets for theoretical contri- 
butions, an increasing preoccupation with speculative considerations, and a certain 
impatience with the journeyman scientist who sticks to his laborious experimental- 
statistical studies and eschews the flights of imagination which characterize some 
psychological journals. 

There can be no substitute for an increasing insistence upon basic science 
methods. Our area of study is so large, and our resources so meager, that it is both 
wasteful and rationally inconsistent to squander so much effort and resources on 
problems and methods of unproven validity. Not that we want to discourage 
imaginative theorizing and hypothesis formation. We merely insist that we must 
not go overboard on every unproven trial balloon which crosses the clinical horizon. 

In other words, the poor record of clinicians in research judgmental situations is 
a function of particular invalid judgmental processes and not of clinical judgment as 
a whole. In some cases, the failure was caused by attempts to apply invalid theoret- 
ical systems, t.e., the clinician did his best in an invalid.frame of reference. The wise 
clinician should not allow himself to be persuaded to make types of judgments for 
which there is no valid basis. It is important to differentiate whether it is theories or 
judgmental processes which are being validated. 


PostuLaTE IV. IN CONTEMPORARY PRACTICE, MANY TESTS AND METHODS ARE BEING 
UTILIZED FOR PURPOSES WHICH ARE INVALID, 1.¢., TEST APPLICATIONS WHICH 
MAY HAVE LIMITED VALIDITY IN RELATION TO SPECIFIC SITUATIONS BECOME IN- 
VALID WHEN USED FOR PURPOSES FOR WHICH THEY WERE NOT INTENDED. 


It is perhaps too revolutionary to expect contemporary clinicians to abandon 
current practices which are all they have to go on even if of doubtful validity. Con- 
temporary clinical psychology has a tremendous investment in perhaps a score of 
tests and methods including the Stanford-Binet, Wechsler scales, MMPI, TAT, 
Rorschach, sentence completion and association tests, figure drawing, etc. If “y 
appreciable number of these ‘“‘standard”’ methods were to be suddenly discredited, 
the average “cook-book”’ clinician would feel the ground torn out from under him. 
But this is actually what has been happening, as an increasing weight of studies pro- 
vides cumulative evidence that a large number of current practices are not valid for 
the purposes for which they have been used. 

Meehl was correct in anticipating the increasing number of studies which have 
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shown that many clinicians are not making better than chance predictions and could 
better be replaced by machines which presumably could come up with the right 
answer 50% of the time. Personally, we would have been very much embarrassed to 
be one of the clinicians in many of the studies who consistently did ‘‘worse’”’ than 
chance in making decisions. Perhaps this is one of the reasons underlying the discern- 
ible reluctance of many clinicians to be caught making a clinical decision which might 
be wrong, or to gravitate towards teaching or administrative positions where their 
efforts will not be subjected to objective verification. 

But in spite of all these remarks, we do not intend to be nihilistic. The great 
challenge of the next 20 years will be to refine the wheat from the chaff, to abandon 
most of what is already obsolete, and to relentlessly pioneer in new directions, ever 
enlarging the variety of operational methods of approach to multi-leveled phenom- 
ena. Here, we are going to risk the prediction that a thoroughgoing eclectic approach, 
studying behavior on all levels of organization of personality integration, is the one 
systematic position which has any prospect of eventual success. In our own clinical 
practice, we find that our judgments have validity only to the degree which they 
reflect this position. 


PostutaTE V. IN GENERAL, THE INDUCTIVE APPROACH TO CLINICAL JUDGMENTS IS 
MORE VALID THAN THE DEDUCTIVE APPROACH. 


Classically, the nature of clinical diagnosis was considered to be the problem of 
classifying clinical syndromes by reasoning from the general to the specific. Given a 
classification system or theoretical viewpoint, the problem was to discover how this 
applied to the individual or into what group he could be classified. Operationally, 
this involved the super-imposition of a number of tracings or patternings over the 
data to see which one fitted. This is the method whereby we demonstrate that a 
certain profile fits a certain group, and then assuming that the individual is a mem- 
ber of the group because he fits the profile. Hypothetically, this would be the ap- 
proach of computer analysis which had a number of profiles programmed into it, and 
which would then ingest clinical data and differentiate which pattern the case 
belonged to. Such an approach is of questionable validity for many reasons. First, 
there is not yet any general agreement as to the validity of any classification system. 
Second, few individual cases fit any general pattern exactly. Third, the etiologic 
equation of every case is different, being ‘‘one of a kind’’. 

A much more valid operational approach is the inductive method of arguing 
from the specific to the general. Rather than starting with preconceived patterns 
which we attempt to superimpose on the individual, we start with the individual data 
and try to arrive at an etiologic equation or profile which is distinctively his own. 
We submit that the demonstrated invalidity of many clinical judgments (following 
Meehl) is a function of erroneous attempts at deductive reasoning in applying 
generalizations which do not fit individual cases. Inductive diagnostic formulations 
would appear to have much higher validity since they are couched in terms of per- 
sonal dynamics, but for this very reason are not subject to standard computer 
analyses, being one of a kind whose validity must be evaluated in terms of prognostic 
outcomes. 


PosTuLaTE VI. THE ULTIMATE VALIDITY OF THE ART OF CLINICAL JUDGMENT DE- 
PENDS UPON THE OBJECTIVELY-ORIENTED, CONSTANT EVALUATION AND REFINE- 
MENT OF CLINICAL EXPERIENCE TO DETERMINE WHAT IS VALID AND TO ERADICATE 
THE INVALID. 


Clinical competence depends upon the constant refinement of cumulative ex- 
perience in terms of the latest discoveries of time and place. Within the self-critical 
orientation of a constant awareness of possible sources of invalidity, the develop- 
ment of suspicion as to the validity of judgments about which we are too self-con- 
fident, and the continual effort to eliminate sources to biases inherent in any theoreti- 
cal system or school, the competent clinician is healthily self-critical of all his judg- 
ments and constantly aware of possible sources of error. 
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Elsewhere in this symposium, Arnhoff has stressed that the simple accumulating 
of experience over a period of time is not enough because it may be the wrong kind 
of experience which is being accumulated along with the perpetuation of unrecog- 
nized judgmental errors. Too many clinicians simply become more adept in parrot- 
ting the cliches of the particular school they identify with. They develop great sem- 
antic ingenuity in defending their systems which become unassailable to criticism 
because of the ironclad defenses which are erected about them. It is the self-reflective 
attitude, genuinely constructively critical, which characterizes healthy clinical 
judgment. 

In all clinical fields, the nature of clinical judgment is best revealed in the art of 
evaluating, integrating and drawing reasonable conclusions from incomplete data. 
Great variations exist in the clinical ability to evaluate incomplete data even in 
clinicians with comparable training and experience. Where data are complete, the 
solution is usually obvious, and presumably machines could tote up the result more 
efficiently. But the situation of having complete data rarely exists in clinical practice, 
and the more incomplete the data the more important is clinical judgment. Mechan- 
ical computer methods cannot cope with incomplete data particularly where critical 
facts may be unobtainable in the course of conventional methods. 


PostuLaTEe VII. THE ETHICAL JUSTIFICATION FOR MAKING A CLINICAL JUDGMENT 
WHICH DOES NOT HAVE COMPLETE SCIENTIFIC SUPPORT AND VALIDATION IS THAT 
IT ONLY PROFESSES TO BE THE BEST THAT CAN BE OFFERED AT TIME AND PLACE. 


The nature of practical problems is that they demand a solution. Life cannot 
remain in a posture of paralysed indecision but rushes inevitably onward, either 
carried forward blindly by its own momentum or manipulated by human decisions. 
Other applied sciences such as clinical medicine have adopted the ethical standard 
that the most that can be expected of a clinician is that his judgments reflect the 
best which can be expected at time and place. The clinician is not expected to be a 
superman, making infallible judgments, or displaying perfect efficiency in his prac- 
tice. It is considered ethically sufficient simply to be able to function clinically at 
the level of standards of time and place. 

The basic point here is that society demands action from the clinician. The 
ivory tower theorist or researcher can sidestep any comparable demand for action by 
simply stating that he is not yet ready or has no valid basis for decision-making. The 
clinical situation is comparable to President Lincoln’s Civil War problem of finding 
a Federal general who would press decisive action and break the stalemate with the 
Confederate armies. The clinician who refused to make any decision not based on 
valid grounds would soon find himself out of a job because in the practical world a 
state of paralysis is worse than to make errors. 

The important consideration is not that any clinician makes errors but that he 
corrects them, learning from experience, and gradually overcoming the defects of 
his own constitution and training. The conscientious clinician learns to recognize 
his own blindspots and incapacities, and declines to deal with types of cases with 
which he must regard himself as incompetent. The conscientious young clinician 
bows to the accumulated wisdom of the older clinician, and leaves no stone unturned 
to master proven skills of all sorts. Our conclusion here is that the clinician must 
leave no stone unturned in exposing himself eclectically to all types of training and 
clinical experiences since it is only by learning all that is known at time and place 
that he can turn himself in the compleat clinician. 


PostuLaTEe VIII. CLINICAL DECISIONS MUST INEVITABLY REFLECT EXPEDIENCE IN 
MANY SITUATIONS WHERE THERE ARE NO SCIENTIFICALLY VALID BASES FOR DE- 
CISION OR WHERE THERE ARE CONFLICTS OF VALUE SYSTEMS ALL OF WHICH HAVE 
SOME “RIGHTNESS’’. 

In life there are many situations which have no “correct” solution but in which 
some decision must be made. No matter what is done, it will not satisfy all the de- 
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mands of the situation or even please anybody. The practical solution of such situa- 
tions is to make the clinical judgment which appears to solve the largest number of 
considerations and is the best that can be accomplished at time and place. Such a 
decision carefully weighs the probable costs of alternative plans of action and sup- 
ports the course which appears most advantageous from all angles. 

Since many of the factors or demands which determine clinical judgment depend 
upon the feelings and values of the participants in the situation, it follows that they 
never can be displaced by mechanical decision-making. It is possible that a mechan- 
ical decision might be ‘‘correct’’ in terms of all the normative and actuarial factors 
involved and still please no one. Humanistic values rule out computer solutions. 

The clinician must be expedient in everyday practice to a degree entirely un- 
appreciated by the academician or ivory-towered theorist who can afford to stick to 
principles. In spite of loud proclamations and protests of support for “high prin- 
ciples”, morality, the “right”, etc., the fact is that many instrumentalities of society 
operate consistently according to principles of expedience in order to get anything 
done. Sociological studies have revealed how necessary it is to be an “‘organization”’ 
man if one is to work in large associations where one must “get along”, particularly 
with superiors. Thus every practicing clinician must make some compromise be- 
tween Simon-pure principles and expediency if he ever expects to get, much less to 
hold a job. This is an issue which is poorly understood by academicians and “pure” 
scientists who show little patience or sympathy with clinicians who appear to be 
sacrificing their scientific birthright and engaging in ulterior practices. Simply to 
exert enough influence to secure and hold a private patient may require considerable 
expedience to say nothing of getting the confidence of the person who pays the bills, 
and the cooperation of those whose help must be secured to work out therapeutic 
objectives. The “pure” scientist may not be interested in decisions of this type, or 
approve of them, but they must be made. 


PostuLaTe IX. Even IN VIEW OF THE ADMITTED INVALIDITY OR RELATIVE IN- 
EFFICIENCY OF MANY CLINICAL DECISIONS, SOCIETY MUST DEPEND UPON CLINICAL 
DECISIONS BECAUSE OF THE PRACTICAL AND ECONOMIC LIMITATIONS OF LIFE 
SITUATIONS.’ 


Society inevitably is confronted with the question of how much it can afford to 
pay for improved clinical or mechanical decision-making processes. Admittedly it 
would be desirable to have clinicians with the highest levels of training and exper- 
ience in every clinical position supported by the most efficient computer systems. 
Practically, the law of diminishing returns applies inexecorably to the amount that 
can be expended to secure any given result. Within the foreseeable future, the high- 
est levels of competence and technical equipment can be made available only for 
research problems of the highest priority. 

Such clinical questions as to which ward a patient is to be assigned, what doctor 
is best suited to treat him, how many hours of psychoanalysis it is profitable to invest 
on him, when he is to go home, with whom is he to live, where is he to work, etc., not 
only can have no computer solution in the forseeable future, but probably would not 
justify such investment even though available. 

The value of the clinician is that he is a portable tool, making available a wealth 
of experience and clinical judgment almost instantaneously, requiring no technical 
equipment to move about and install, and best of all, constantly improving the 
quality and efficiency of its judgments through the continuous refining of experience. 
In war, society cannot afford to install computing equipment in every fox hole to 
decide what to do next even assuming its availability. Similarly, the demands of 
clinical practice require the making of billions of decisions by the trained labor force 
available supplemented where feasible by mechanical analysing systems. 

The clinicians have the responsibility to go on making decisions in any situation 
where their predictions are even 1 or 2% better than common sense or pure chance 
choices. In cases where the differential is relatively small, then society itself must 
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determine how much it can afford to pay for such a premium. Decisions have to be 
made to make the World go round, and often without adequate foundations. Let 
us not be afraid to make them. 


PostuLaTEe X. IN ALL FAIRNESS, RESEARCH COMPARISONS OF CLINICAL JUDGMENT 
WITH VARIOUS MECHANICAL SYSTEMS SHOULD UTILIZE ONLY THE MOST COM- 
PETENT EXAMPLES OF EACH TYPE OF RESOURCE, 


Researchers such as Meehl apparently have given insufficient consideration to 
the issue of discriminating which clinicians are capable of making valid judgments, 
or more exactly, which judgments of any clinician have any degree of validity. 

No one would deliberately utilize an experimental-statistical method of de- 
monstrated invalidity for use as a criterion in such research designs. However, 
experiments on clinical judgment have repeatedly averaged the results of good and 
poor clinicians and then been surprised to discover that clinical judgment, on the 
average, cannot make better than chance predictions. The crucial test of clinical 
judgment is to discriminate what clinicians can actually make valid judgments, and 
then to compare the best clinical judgments with the best mechanical methods. 

In our opinion, researchers have been too polite in assuming that any clinician 
with a Ph.D., ABEPP certification, many years of experience, and a responsible 
position will inevitably be a good representative of the best clinical practice. Such 
criteria admittedly have their value, and are perhaps the best protection that a 
democratic society can erect, but they nevertheless leave unproven what the actual 
competence of any particular clinician is. Clinical ability is where you find it and 
it is not always found in those who are conformist enough or politically knowing 
enough to corral the important jobs, titles and awards. 

Another defect of the research comparing clinical judgments with computer 
results is that the tasks chosen typically have not been the ones where clinical judg- 
ment shines at its best. Up to the present, such research has involved rather artificial 


situations (such as blind matching of several different types of protocols produced by 
different types of cases), or attempting to make prognostic predictions based on 
theoretical generalizations which have not been proven valid (as predicting out- 
comes on the basis of projective signs), or to make generalizations based on isolated 
data taken out of context, or requiring the subjects to make global diagnoses from 
data in which the pathognomonic signs have been omitted. An experienced clinician 
would simply refuse to attempt predictions based on insufficient data. 


CoNCLUSIONS 

Clinical judgments are here to stay since people must be dealt with personally 
and not with machines, A false issue is created when critics attempt to match isolated 
clinical judgment with mechanical computer methods. Recognizing that the validity 
of clinical judgment as a whole cannot be any higher than the validity of scientific 
knowledge at any time and place, the emphasis should be placed on the analysis of 
individual judgment processes to discover wherein they are correct or in error. 

A clear distinction should be made between (a) clinical judgments which are 
in error because based on invalid theory, (b) clinical judgments which reflect im- 
proper application of valid theory, and (c) clinical judgments which reflect a failure 
to make a proper evaluation or weighting of all pertinent factors, data or applica- 
tions of theory. 





OBJECTIVE TEST FACTOR U.I. 23: ITS MEASUREMENT AND ITS 
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BACKGROUND AND PURPOSES 


Twenty years of research in this laboratory have aimed at a complete taxonomy 
of the human personality—the discovery, confirmation, and interpretation of the 
major factor-dimensions along which human beings differ.? Sixteen such dimensions 
have been established and interpreted in questionnaire and rating media of measure- 
ment, and more recent work ® has established some twenty dimensions in objective 
performance tests. Some of these tend to involve the same areas of personality as do 
questionnaires, but others have no known questionnaire counterparts and are thus 
considered to be an extension of the range of measureable personality. Another, 
perhaps related advantage of the objective tests is their relative insusceptibility to 
motivational distortion; that is, they are disguised in purpose and relatively difficult 
to fake. ‘ 22) 

Once the sheer existence of a factor-dimension is confirmed, the next steps are to 
improve its measurement, and to interpret it in terms of the variables it involves 
and its relation to “real-life” criteria. The present paper aims to do this for U.I. 23, 
one of the twenty objective test factor-dimensions whose existence has already been 
demonstrated (Cattell & Scheier, in press“). Specifically, we propose: (a) to improve 
the measurement of U.I. 23 for adults and young adults by the discovery and de- 
velopment of tests which will be more highly loaded on (correlated with) the factor, 
and (b) to enrich its interpretation by relating it to the following real-life clinical 
criterion : ‘‘In a mental hospital and /or under treatment with a diagnosis of psycho- 
neurotic vs. not in a mental hospital or in therapy.” 

Historically, efforts to conceptualize and classify neurosis have resulted in a 
number of intuitive-descriptive concepts and categories. Unfortunately, these 
categories were not systematically and empirically interrelated until recently. With 
the advent of factor analysis for analyzing relations between many variables, it has 
become possible to refine much of the ore mined by these early intuitive approaches. 
Hence, a huge array of behaviors dubbed “‘neurotic’’, for this or that reason, has been 
empirically classified into progressively more homogeneous covarying unities. Cattell 
(4, 5. 7) and Eysenck “5. 16. 17) have traced the development of the present-day factor- 
ially-based concepts of neuroticism, linking the early, primarily clinical-rating, 
studies and the later researches employing self-reports and objective tests as well 
as ratings. 

The Maudsley hospital research conducted under Eysenck “*: !’- !8) has stayed 
close to the intuitively-based conceptions of neuroticism as these are expressed in 
clinical ratings. Essentially, Eysenck’s neuroticism studies analyze variables each of 
which is first shown (via t-test of difference) to discriminate between clinically-rated 
neurotics and normals. Then, in determining the location of the factor, he rotates 
to maximize the weight accorded one variable, viz., the clinical ratings, a variable the 
reliability of which is rarely determined and the validity of which Eysenck  him- 
self has regarded as poor. Moreover, in our view, he extracts too few factors to 
properly account for the variance of the tests analyzed. From the outset, therefore, 
this method predisposes to—or even necessitates—the eventual finding of only one 
neuroticism factor which will center on clinical conceptions of neurosis. In essence, 
then, Eysenck’s procedure permits some analysis of clinical intuitions into objectively 
measured components, but does not permit a really thorough analysis of the neuro- 
ticism criterion based, as it must be, on many variables in addition to ratings alone. 


1The research reported in this paper was supported by a grant from the United States Public 
Health Service, National Institutes of Mental Health, 


*The first two hs of this paper are a synopsis of viewpoints and evidence presented else- 
where “*) in much more detail. . 
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By contrast, our position“) is that one ought first to concentrate on analyzing 
actual relations among a wide range of tests covering the personality sphere and 
including known and putative measures of neuroticism. The first criterion in this 
approach is the actual factor analytic groupings among the variables themselves. 
The emerging factors are related to clinical and other external criteria only after 
they are firmly established among the tests themselves. With most of the twenty 
objective-test factors so far found, we are still primarily in the first phase of research, 
viz., confirmation of the existence of the factor dimensions in a wide sampling of 
tests. However, U.I. 23’s existence as a consistently appearing pattern among ob- 
jective tests has now been well-established by replication in eight separate factor 
analytic studies“), employing 315 variables, some administered repeatedly, for a 
total of 756 variable administrations. This wide coverage required that each test 
be fairly short, with the result that reliabilities and hence factor loadings have been 
considerably reduced. Therefore, the first purpose of the present study is to modify 
present U.1. 23 tests, and to discover new ones, with a view to increasing the size of 
factor loadings. 

Interpretations of the U.I. 23 dimensions included an hypothesis of relation- 
ship to clinical conceptions of neuroticism because: (a) U.I. 23 is loaded by tests in- 
volving behaviors which the consensus of modern clinical opinion regards as neurotic 
e.g., rigidity, effort intolerance, etc.: ', and (b) some of U.I. 23’s highly load 
tests are similar to, or identical with, tests which seem to constitute the core of 
Eysenck’s neuroticism factor: »». 5! #), which later is known ® 17, 18. 20) to dis- 
tinguish neurotics from normals. However, the present study is the first direct check 
on U.I. 23’s relation to clinically-assessed neuroticism vs. normalcy. This check is 
necessary because the precise degree of similarity between Eysenck’s neuroticism 
factor and U.I. 23 has never been established by studies which analyze all of his and 
all of our variables together. Even now, however, the factors are demonstrably not 
identical, since several of Eysenck’s neuroticism tests are very similar to tests which 
have been found to load principally factors other than U.I. 23. Thus our prelimin- 
ary hypothesis is that U.I. 23 may represent some aspects of clinically-judged 
neuroticism, but probably will not exhaust the variance in this category. Eysenck’s 
results served initially to suggest this hypothesis, but its confirmation from now on 
will not depend on demonstrated similarity between his factor and U.I. 23. If we 
wished only to compare the clinical relationships of the two factors we would have to 
reproduce Eysenck’s samples of neurotics as closely as possible, or check both factors 
against the same clinical criterion groups. Our present purpose is not this, but rather 
it is the establishment of the relation between U.I. 23 and a criterion of neuroticism 
which is as representative as possible. As the next section indicates, this determines 
a method of sample selection somewhat different from that employed by Eysenck. 


PROCEDURE 


Subjects. Two groups, henceforth referred to as R6 and R8, were studied to provide 
data on the research problems described in the previous section. R8 consisted of 97 
§’s, all normals, including 77 U. 8. Air Force males and 20 University of Illinois 
undergraduate volunteers (14 male, 6 female). R6 included 98 paid 8’s, 49 diagnosed 
clinically as psychoneurotic and 49 normals matched with the neurotics as follows: 
13 males and 36 females in each group; mean age for normals was 32.5 years, for 
neurotics 32.6 years; age range for both groups 18 to 65 years; mean educational 
level was 11.4 grades (years) completed for normals, 11.2 for neurotics, ranging from 
8th grade to college graduate. Sigmas (as well as means) were virtually identical for 
the two groups on the matching variables. 

The classification ‘‘neurotic’”’ was accepted solely on the basis of the judgment 
of qualified clinical personnel. However, one cannot claim a representative sample 
of neurotics because clinicians themselves are not agreed on the nature and diagnosis 
of neurosis. We did attempt to center on a consensus of clinical opinion (a) by draw- 
ing §’s from institutions which, for the most part, followed the American Psychiatric 





OBJECTIVE TEST FACTOR U. I. 23: MEASUREMENT AND RELATION TO NEUROTICISM 137 


Association Committee on Nomenclature and Statistics” system for classifying 
neurosis, and (b) by sampling types of diagnosticians and persons diagnosed, i.e., 
neurotics were drawn from five Illinois state mental institutions and one privately 
endowed institution. Wide inter-psychiatrist and inter-institution differences in 
diagnostic preconceptions and in classificatory outcome have been found to exist for 
psychosis and anxiety °: "); it is reasonable to suppose that they also exist for neuro- 
sis. If so, the consensus our sampling of institutions achieved might actually have 
been a mélange of rather distinct populations of neurotics. This makes it all the more 
important to understand just what population or populations of neurotics our sample 
refers to. This will be suggested by reference to the age, sex, and educational data 
on the neurotic sample, and by consideration of the following other points: (a) The 
distribution is truncated as far as severity of neurosis is concerned. About 20 8’s 
were eliminated if, after we consulted with the clinician-in-charge or after they 
attempted the tests, they were judged to be too disturbed to carry out the instruc- 
tions the tests required. On the low severity side we of course missed those neurotics 
who function effectively enough not to need treatment, and within treated neurotics 
our ratio of inpatients (presumably more disturbed) to outpatients was 36 to 13. 
(b) The APA committee’s®) conception of neuroticism appears to be more re- 
stricted than the traditional ones, e.g., it classifies as ‘sociopathic personality”, or 
“psychophysiologic disorder’’, etc., patients who might be classified as neurotic in 
less restricted conceptualizations. Our sample of neurotics included 15 Anxiety 
Reactions, 7 Obsessive-Compulsives, 4 Depressives, 14 in the psychoneurotic-other 
category, 2 each of Dissociative Reaction, Conversion Reaction, and Phobic Re- 
action, and 3 patients who, at the time of our testing, were regarded as “mainly 
psychoneurotic” by the clinicians-in-charge even though an earlier diagnosis had 
classified them as passive-aggressive personality which the APA committee con- 
siders as distinct from psychoneurosis, per se. (c) All neurotic S’s were receiving 
treatment free, or at low cost, which would tend to result in lower socio-economic 
status for the sample relative to the U.S. population of neurotics. 

We can think not only of representative sampling from normal and neurotic 
populations, but also of accurate representation of the population relationship be- 
tween neurotics and normals. Our matching of neurotics and normals is defensible 
in that it controls the effects of certain extraneous variables, but it may have pro- 
duced a “relational bias” especially in the case of educational level (which is sub- 
stantially correlated with intelligence). Evidence suggests that neurotics are lower 
than normals on educational level and intelligence. If this is so, matching neurotics 
and normals on educational level might bias the normal sample towards the neurotic 
or vice versa: That is, the “sample relationship” would be closer than the “popula- 
tion relationship” of neurotics and normals and all correlations between tests and 
this criterion would be attenuated relative to the value for the population of neu- 
rotics and normals. 


Method of Analysis. In both R6 and R8, a number of test variables were scored for 
each S, then analysed by the method of correlational extension analysis. There are 
two steps in this method: (a) S’s score on a factor-dimension is computed as the 
weighted sum of his scores on tests which previous factor analysis has found to load 
this factor highly. Thus, high motor-perceptual rigidity, low aspiration level, etc., 
have been found to load U.I. 23(—), and we can therefore estimate each 8’s score on 
the factor by a weighted combination of his normalized scores on these tests. (b) 
Correlations are then computed between each test variable in the study and the 
factor estimate computed as above. Each variable’s correlation with the factor 
estimate is assumed to be an accurate estimate of what its loading on the factor would 
be in full-scale factor analysis. The grounds for this assumption are that (a) factor 
loadings are essentially equivalent to correlations with a factor, (b) loadings and cor- 
relational extension values were found to be very similar when both were computed 
in one study for a given set of variables (unpublished data), and (c) studies on another 
factor have previously shown that a variable’s loadings are consistent between 
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different studies whether computed by direct factor analysis or correlational ex- 
tension analysis“: 4, 

Thus, correlational extension analysis is accepted as a brief, easy and accurate 
way of estimating the loading values which would be achieved in a primary factor 
analysis. In the R6 and R8 studies, Pearson product-moment correlations were com- 
puted between all variables and an estimate of U.1. 23, the factor criterion, which 
latter was computed by a procedure to be described in detail in the next sections. In 
R6, Pearson product-moment correlations were also computed between all variables 
and a clinical criterion of neuroticism vs. normalcy. To form this clinical criterion 
distribution, a value of ‘‘2’’ was entered for each of the 49 neurotics, a value of “1” 
for each of the 49 matched normals. Hence, a positive correlation between a variable 
and this clinical criterion indicates the degree to which high score on that variable is 
associated with presence in the neurotic rather than the normal classification. 


Mo ag In the R6 study, a total of 119 variables of three major types were em- 
ployed: 

(a) The Clinical Criterion variable described above. 

(b) The Factor Criterion, composed of 10 separate but related estimates of 
score on the objective test factor U.I. 23. (—) (Detailed procedures dis- 
cussed in next section.) 

(c) The Established and Putative Tests of the Factor, some 108 in all. These in- 
cluded (i) tests previously known to load the factor (These were included 
in order to check on, and improve, their relation to the factor and discover 
their relation to the clinical criterion), (ii) newly developed tests which it 
was hoped would load the factor, and (iii) tests previously known or be- 
lieved to be related to the clinical criterion, these being included in order to 
see if they had relation to the factor and to check on their relation to the 
clinical criterion. 


Many of the above variables were only relatively minor variations in the scoring 
of a given test. We could thus determine which score had the highest correlation 
with the factor and hence would most likely provide the best measurement of the 
factor. 


In the R8 study, a total of 47 variables, of two types were employed: 

(a) The Factor Criterion, composed of 10 estimates of U.I. 23 (—). 

(b) 37 other variables of the type described in (c) above, emphasizing variables 
whose relationship to U.1. 23 was high in R6, but in need of further repli- 
cation.* 


In the R6 and R8 studies, the U.I. 23 factor criterion was the composite of 8’s 
normalized score on nine ‘‘marker’’ variables; t.e., variables previously found to load 
the factor relatively highly and consistently. In order to estimate the relation be- 
tween each marker and the factor without artificial enhancement of correlation due 
to self-correlation, there were nine factor estimates, each based on 8 markers (i.e., 
omitting the particular marker whose uncontaminated correlation with the factor 
needed to be determined). Thus, the rigidity marker was correlated with the U.1. 23 
estimate not having contribution from the rigidity test, etc. A final factor estimate 
(making a total of 10) based on all nine markers was the factor criterion with which 
all non-markers were correlated. Table 1 lists variables used in the R6 factor estim- 
ate, together with their normalized score weights, and the basis for this in terms of 
average size and deviation of loading, and number of confirmations in previous re- 
search. With minor exceptions these are the best measurements of U.I. 23 found 
prior to the R6 and R8 studies. The next section will discuss progress in U.I. 23 
measurement achieved in the R6 and R8 studies. 

. “A fall description and specimen copies of most tests used in this study may be obtained else- 
where ° 
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TaBiE 1. Tue R6 Factor CRITERION VARIABLES AND WEIGHTS Usep in Estrmatine U. I. 23 (—), 
(By Hyporuesis tHe Nevurotic-RevatTep Pos or U. I. 23) 





Title of Test Variable in U.I.23 Average Loading Average Normalized score weight 

(—) Direction. (With master on U. I. 23, (—) deviation for estimating R6 

index variable number for cross- (Number of studies of loadings.** Factor Criterion. 

reference in other researches). on which te is 
based in parenthesis‘). 





. High motor-perceptual rigidity 
M. I. 2) .31 (9) 10 

. High ratio of inaccuracy to 

speed (M. I. 120) (More er- 
ors per unit of work) .34 (7) .06 

. Slower decision on particu- 
lars than principles (M. I. 151). .21 (8) ll 

. Much excess of aspiration 

rag) ag samy rformance 
in “Coding’”’ (M. I. 101 b, 148) .20 (5) .18 

. Fewer words correctly re- 

membered in “Immediate 
Memory” test (M. I. 167) .06 (5)*** .06 

. Fewer letters written cor- 

rectly in “ an7 Skip- 
ing’ test (M. I. 550) . 26 (1) not 


igh ratio s writing applicable 
with prefer to — 


fe: hand (M. I. 538 
. Fewer problems done cor- 
rectly “Multiplying in One’s 
Head” (M. I. 542) .25 (4) 10 1 
. Low absolute level of aspir- 
ation in coding (M. I. 638) .56 (1) not 1 
applicable 


*This average of loadings is based on all young adult or adult studies in which the variables ap- 
d. Prior to R6-R8, there were nine researches in all, the eight previously reviewed (Cattell & 
heier, in press “*)), plus a study referred to in our research publications as R4. 

**This indicates the scatter of the variaLle’s loadings, one from the other, through the number of 
studies in which the variable appeared. Specifically, it is the average amount by which individual 
loadings differ from the central tendency “average loading” value. 

***Though loading very low in our previous studies, this test had correlated fairly substantially 
with Eysenck’s similar factor. A modified and lengthened version of the test was therefore treated as a 
marker in this study, i.e. nine of the ten factor estimates included it as a low-weighted marker 
with the understanding that the tenth estimate would be used if the test failed to correlate sub- 
stantially with the factor as estimated without its inclusion. It was then retained as a marker when its 
correlation with the factor estimate based exclusively on the other right markers proved to be .34. 


- 26 (1) “4 





RESULTS 


The Measurement of U.I. 23. In this section, we are concerned with the second and 
third columns of table 2 giving the correlations of variables with the U.1. 23 factor 
criterion in the R6 and R8 studies. Table 2 confirms most but not all of the former 
U. I. 23 markers. Thus, table 1 markers #1, 2, 5, 6, 8 and 9 continue to associate 
highly with the factor in R6 and R8 (see table 2 rows 11, 5, 17, 7, 1, 4 respectively). 
However, former (table 1) markers #3, 4, and 7 fail to stand up as markers in R6 
(see table 2 rows 33, 29, and 32 respectively), with #3 and 4 actually reversing in 
expected sign. As prospective replacements for these markers, several variables tried 
out for the first time in R6, have very high correlations with U.I. 23 (e.g., table 2 
rows 3, 6, and 9). Since unreplicated associations should not be accepted as a basis 
for a final factor-measurement battery, the best of these new measures were tried out 
ain in R8 as a check on their U.I. 23 relationship in a different sample of persons. 
The prospective markers chosen for checking in R8 were selected not only on the 
basis of their R6 factor correlation, but also because they appeared relatively dis- 
similar to previously known markers. 
In R8, a factor estimate was first obtained from five of the markers best con- 
firmed in all studies up through R6. Another factor estimate was obtained from four 
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Tasie 2. CorRRELATIONS* OF VARIABLES wiTH U. I. 23 (—) Factor Scorp anp Wits Nugvrotic/ 
Normat Curnicat CLASSIFICATION 





Variable Title in U. I. 23 (—) Direction of Correlation with the Neurotic- Correlation With 
Association** Normal Dichotomy, The U. I. 23 (-—) 

‘ “Clinical Criterion” “Factor Criterion”’ 

R6 R6 R8 





Few problems done correctly “Multiplying 
Stow souedl of “endian,” pest 

ow of “coding” ormance 
Few done correctl i pe Down Drawings” 
Low absolute level of aspiration in ‘coding’ 
High ratio inaccuracy to speed: all relevant 
tests. 


SS2= 


Few done correctly ‘“‘Where Do You Land” 
Few letters written correctly “Alphabet 
Skipping” 
Low total fluency “Completing Words” 
Few done correctly ““Where Do the Lines Cross”’ 
Few done correctly ‘Proofreading Figures”’ 
High motor-perceptual rigidity 
Slow speed ‘ = g to Oneself’”’ 
More “neurotic” points “Neurotic Symptom 
Checklist” 
Low proportion “in-between” responses 
“Neurotic Symptom Checklist” 
Low proportion correct “Finding the Longest Line”’ 
Few years completed in school 
Few words correctly remembered, in “Im- 
mediate Memory”’ test. 
Few letters repeated “(Completing Words” 
Few write-in answers “What Answer Would 
— Make?” ae On os 

w speed “Copying Compact Design” 
More ‘Neurotis” points “Attitudes and 
Interests” questionnaire. 
Few decisions made ‘General Decisions: 
Particulars’ 
Older chronological age 
More “neurotic” points “MacMillan Inventory” 
Low proportion aow “General Decisions: 
Particulars” 


Less compact writing ‘Writing an Essay” 
Low proportion form (to color “Preference 
for Pictures” 
Few words written “Writing an Essay” 
Less excess aspiration over preceding perform- 
ae “Coding” 7 

i oportion in-between responses 
” acMillan Inventory” 
_ ratio emotional to correct words re- 

ed in “Immediate Memory” 

Higher ratio _— writing preferred to non- 

referred han 

aster decisions on “particulars” than on 
ee “General Decisions” 

own marital discord 

aad — “16 P. F. factor dimension 
More will control: “16 P. F. Q; (+)” 
Less dominant: “16 P. F. E(—)’ 
High proportion in-between responses “16 P. F.” 
| pen 

igh proportion in-between responses on Composite 
Questionnaire 
— self-sufficient: “16 eke y 

experimenting: “16 P. F. Q.i(—)” 

Less naive: “16 P. F. N(—)” 
Less practical: “16 P. F. M(+)” 


SRESenp Ne oeeD o 
8 ERSSSe S35 


— 
> 


ed 
ad aa 


—— 
so 9 


20. 


to 
— 


: 


G8 SSE S BS BSB BRR S SS SE SE EN BSEGSS BE ATES 


Bees 8 B88 RF 8 Ff SF BR NE RES S 





OBJECTIVE TEST FACTOR U. I. 23: MEASUREMENT AND RELATION TO NEUROTICISM 141 


Less suspicious: ‘16 P. F. L(—)” 

Less realistic: “16 P. F. I1(+)” 

Less confidence: “16 P. F. O(+)” 

Less super-ego strength: “16 P. F. G(—)” 
Less adventurous: “16 P. F. H(—)” 


Less ego strength: “16 P. F. C(—)” 

Less cyclothyme: “16 P. F. A(—)” : ‘ 
More “neurotic points” “Composite Questionnaire”’ 
Ergic Tension: “16 P. F. Q.(+)” 


SERREQRELS 





*With N = 98, anr of .20 is significantly greater than zero at the .05 level; an r of 26 is significant 
at the .01 level. 


**When the same new test was scored in a number of similar ways only the highest-loading scoring 


was usually retained here, hence, table 2 contains fewer than the total number of variables originally 
scored 


***The accuracy to accomplishment ratio was obtained from ‘“‘Where Do You Land’, “Proof- 
reading Figures”, “Where Do The Lines Cross”, “Upside Down Drawings’, ‘‘The Longest Line” and 
“Multipl ing’”’ tests. : 1 
( is . — test loaded factor in direction opposite to that expected from previous research 

see table 1). 


of the most promising tests newly discovered in R6. The new four-marker estimate 
had a correlation of .85 with the old five-marker estimate (excluding variables 3, 4 
and 7 in table 1), thus indicating that the new markers as a group do indeed measure 
the factor well. Also, the new markers (table 2 rows 3, 6, 9, 10) were individually 
well confirmed as to size of loading in both R6 and R8. On the basis of this evidence, 
the factor estimate finally chosen for comparing other variables against U.I. 23 was 
the weighted combination of the five old markers plus the four new markers. As in 
R6, there were nine special estimates each corrected for auto-correlation with one of 
the markers, plus a tenth estimate including all nine markers. Table 3 describes the 
U.1. 23 factor-markers used in R8, and also, for completeness, gives data on some 
variables not included in R6 or R8, but found to correlate rather highly with the 
factor in previous work. If one compares the size of relationships here with those 
known five years ago“? or even two years ago (table 1), it is evident that the factor 
is now measureable with a much higher validity than formerly. Loadings of marker 
tests have been improved by lengthening and modifying content and instructions,‘ 
while highly saturated new markers have been discovered. As for the previously 
established markers, the table 3 values tend to underestimate the U.I. 23 validity 
of the latest improved form of a test since the average loadings are affected by the 
older, lower loadings from relatively primitive versions of the tests. If, even with this 
in mind, the reader still feels that the correlations are low, he should remember that 
the tests average only 3 to 4 minutes in duration. When combined in a 30 minute 
battery they yield a multiple correlation (estimated elsewhere) with the factor of at 
least .70 —.75%», 

Old markers have been described in previous publications, ®) some of the more 
promising new ones are described briefly in footnote 5 below, and those which appear 


‘It might be argued that the R6-R8 rise in some marker loadings is an artifact of the correlational 
extension analysis method. But against this we have the arguments previously made for the similarity 
between correlational extension and factor analytic relations of a given variable. Moreover, some of 
the old markers droppe: in loading between the primary factor studies and R6-R8, while others rose. 
Finally, the lengthening and modifications made were of the type normally expected to raise loadings. 
However, we agree that the final proof of improved measurement must rest on a direct factor analytic 
check, one of which is now in progress in our laboratory. 

5(i) “Where Do The Lines Cross?”’ requires that S visualize the intersection point of two imaginary 
lines. End-pointe of the lines are indicated by reference to rather widely dispersed lettered points on a 
chart (e.g. line A-C or E-R) while alternative possible intersections are indicated by numbers. § is 
given only 6 seconds for each of a series of such or and thus suffers if he gets flustered in trying 
to keep the et Sp and intersections “‘in his .”’ He cannot trace the lines on the chart or move 
any part of his body over it. (ii) “Where Do You Land?” also appears to require that S visualize a 
situation under pressure and do speedy work. In this test the S must move visually up and down a 
strip of ten squares in accordance with four simple rules governing movement from one square to 
another. (iii) “Upside Down Drawings” is a simulated mirror drawing test. S must connect dots to 
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Tasie 3. THe R8 Facror CriTeRION AND THE Best Muasunrss oF U. I. 23 (—) ro Dats 





Title of Test Variable in U. I. 23(—) Average Loading on Average Normalized Score 
Direction With Master Index Number U. I. 23 (—) with Deviation Weight used for 
for Research Cross-Reference number of studies of estimating U. I. 
on which av is Loadings.* 23 (—) score, as 
based in parenthesis used in R8 
estimate. 





R8 Facror CriTsrRIon 


—_ 


Fewer problems done correctly 

“Multiplying in One’s Head” (M. I. 542) .35 (6) 15 
Fewer letters written correctl 

in “Alphabet Skipping’’ test (M. I. 550) .36 (3) ll 
Slower speed, “Reading to Oneself” (M. I. 

516) .27 (5) 17 
High ratio of inaccuracy to speed 

(M. I. 120) .37 (9) .06 
High motor-perceptual wad (M. I. 2) .32 (11) ll 
Fewer correct answers “Where Do You 

Land?” (M. I. 604b) .56 (2) 01 
Fewer correct answers “Proofreading 

Figures” (M. I. 606b) .39 (2) .07 
Fewer correct answers “Upside Down 

Drawings” (M. I. 610a) .58 (2) .02 
Fewer correct answers ‘““Where Do The 

Lines Cross?” (M. I. 609) -48 (2) 10 


Se Ne Fw LW 





Oruzr U. I. 23 (—) Markers; Numbers 12-16 Were Nor in R6-R8 

Low fluency in writing or speaking, #8 in 
table 2, somewhat different variable in earl- 
ier work, more like #28, table 2. (M. I. 1, 
271, 576) .29 (5) 13 
Low absolute level of aspiration in coding 

M. I. 638) .54 (3) 05 

ew done correctly, adding numbers on 


M. I. 199 .25 (4 12 
Fey body sate ibility (M. I. 42) .16 ® .07 


Poor two-hand coordination (M. I. 41) 13 6 .18 
Larger PGR to “Mental” than to “Physical”’ 

Stimuli (M. I. 305) .16 (3) 10 
Low “dynamic momentum’’ or high effort 

intolerance” (M. 1. 28 or 505) 18 (3) .12 
Few words correctly remembered in 

Immediate Memory test: redesigned in 

R6 (M. I. 167) 10 (6) .10 





*See footnotes * and **, table 1 modified now as follows: (a) There are now a total of 11 studies, 
(b) Average Deviation will, of course, increase as a test is improved in R6 and R8, and is thus spurious- 
ly higher as applied to the current forms of tests like No, 1. table 2 to compute average fealing and 
average deviation which applies best to current forms of many of these tests. 
to constitute the best factor battery will soon be available as a published test bat- 
tery). (Selection of tests for this battery involved, however, several criteria other 
than sheer size of loading.) 

Thus far, we have concentrated on objective test measurement of the U.I. 23 
dimension but there is no @ priori reason why questionnaire measurements of the 
factor cannot eventually be found. Cattell“? has shown that questionnaires and 
objective tests can have precisely specifiable relations. The ease and speed of self- 
report methods motivated us to seek questionnaire items which would measure U.I. 
draw in the upside-down image of a series of simple line drawings. (iv) “Multiplying in One’s Read’ 
requires that 8 multiply in his head a number by the next number and write down the answer, 
continuing the series as long as he can thus, 5 X 6, 6 X 7,7 X 8, ete. test seems to demand that S 
concentrate and persevere over a short time interval. (v) “Alphabet Skipping” is similar. S must 
write every other letter of the ne ages ben every third letter, then every other letter beginning at 

All the above 


the back of the alphabet, and so on, without ever writing down the intervening letters. 
are tests. 
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23 by relating it to the 16 P. F. Questionnaire.“°® This questionnaire is a distillate 
composed of the ‘‘best’’ of over 4,000 questionnaire items sampled in an attempt 
to succinctly but comprehensively cover the range of personality characteristics 
measureable by self-report methods. In three previous studies, none of the sixteen 
questionnaire personality dimensions involved showed any substantial, consistent 
relation to U. 1. 23.“4) The same result occurred for a fourth time in R8 where, with 
the exception of 16 P. F. factor-dimensions B, Q;, and E, none of the factors had a 
significant relationship to U. I. 23. Moreover, since the association of these three 
factors with U.1. 23 is not confirmed by previous studies “*), it cannot be considered 
as established. 

In spite of the above results, it remained conceivable that special questionnaire 
items could be devised to measure U.I. 23. The following questionnaires were there- 
fore constructed and/or analyzed in R6: (1) A neurotic symptom self-checklist 
where S marked himself above, below or at the average on a large sampling of symp- 
toms clinically associated with neuroticism (table 2, #13). (2) A set of items con- 
structed by the authors (table 2, #21), aiming to involve in questionnaire form, the 
characteristics which objective test markers suggested might associate with U.I. 23. 
(3) The MacMillan® neuroticism inventory, with an established substantial re- 
lationship to clirically-judged neuroticism (table 2, #24). These questionnaires 
averaged only about .25 correlation with U.I. 23 (see table 2), though the latter two 
( #’s 21 and 24) correlated quite highly with the clinical criterion. In a final effort 
to improve U.I. 23 questionnaire relationships, the thirty items with highest U.I. 23 
associations in R6 were formed into a questionnaire. This questionnaire had a 
virtually zero U.I. 23 relationship in R8 (table 2, #52). Perhaps some sample 
selection effect in respect to neuroticism is responsible, but the point need not be 
labored here. The main conclusion, both from 16 P. F. general coverage and the 
special-purpose coverage, is that while some aspects of clinically-judged neuroticism 
can be measured well by questionnaire, its U.I. 23 aspects cannot, at least in relation 


to typical questionnaire loadings achieved on other factors. It appears that U.I. 23 
characteristics are not readily introspectible, and hence that the factor must be 
measured mainly by behavioral objective tests. 

Although questionnaire categories of response (i.e., item contents) do not asso- 
ciate well with U.I. 23, perhaps questionnaire response sets do. Previous data were 
somewhat contradictory and unencouraging on this point‘ ble 2) and so is the 
R6-R8 sa as regards ‘‘in-between response set.’’ The direction of association with 


U.1. 23 (and with the clinical criterion) actually reverses for in-between response set 
on the neurotic symptom check list (tabic 2, # 14) as compared with the other ques- 
tionnaires (table 2, #’s 30, 38, 39). Plausible after-the-fact explanations can be 
given for such reversals in any given case but, realistically, they would be difficult 
to apply in the precise prediction of response set associations for other questionnaires, 
especially for other samples of 8’s. Naturally, we seek variables which will measure 
consistently through a wide range of sample types. Hence response set in question- 
naires must be regarded as not yet applicable to the measurement of U.I. 23. 


The Relation of U.I. 28 to the Clinical Criterion of Neuroticism.* In R6, comparison 
of the U.I. 23 factor scores of the 49 neurotics and the 49 normals yielded a ft-ratio 
significant at the .01 level, with neurotics at the U.I. 23 (—) pole, as expected. 
Stated as a Pearson product-moment correlation (here a point biserial), the relation 
is .27 between the nine-marker estimate of U.I. 23 (—) and the clinical criterion of 
neuroticism. Although, as noted above, this correlation is significant at the .01 level, 
it reveals how far from perfect the relation is. The conclusion that U.I. 23 is not 
identical with the clinical notions of neurosis is therefore suggested by: (a) The fact 
that the above correlation, while statistically significant, obviously is not high enough 


*The reader interested in the degree to which individual tests discriminate between neurotics 
and normals will find these data in the first column of table 2, as correlation coefficients. However, he 
should remember that these data need replication, + oY since individual tests are less likely to 
behave consistently from sample to sample, than are factors. 
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to account for all the reliable variance in the clinical criterion (for example, correla- 
tions as high as .47 and .52 occur with this criterion in table 2). (b) As noted pre- 
viously, markers from Eysenck’s clinical-criterion-tied factor load on several factors 
other than U.I. 23. (c) About half of the 16 P. F. Questionnaire dimensions have been 
found to significantly discriminate between clinically-judged neurotics and norm- 
als@° 18); yet none of these factors have any substantial confirmed relationship to 
U. I. 23. (d) Recently analyzed data directly and strongly suggest that three other 
ay nay factors relate to clinically-judged neuroticism at least as highly as does 
1, 23 4) 


Interpretation of the U.I. 23 Dimension. A searching, full-scale interpretation of 
U.I. 23 will soon be available“), based on a wider array of criterion and personality 
context evidence than could be reviewed here. In this paper, we will stay very close 
to the manifest characteristics of the loaded tests, also emphasizing the just-dis- 
covered clinical criterion relationship. Since U.I. 23 is apparently only one of the 
factors in neuroticism, the question for interpretation becomes: To what aspects 
(or categories) of neuroticism is U.I. 23 most closely related? The U.I. 23 markers 
discovered in R6 and R8 agree in sense with older markers, suggesting at the U.I. 
23(—) pole, inflexibility, effort intolerance, and inability to do fast, accurate work 
(see, for example, table 3 and footnote 5). This may be summed up roughly as 
“‘incompetence’’, or inability to properly utilize the abilities one has, but it certainly 
is not synonymous with low intelligence itself.’ Scanning of the marker tests im- 
mediately suggests a relation between U.I. 23 and the classical clinical category of 
neurasthenia with its nervous fatigue, incompetence, overwroughtness, paresthesia, 
difficulty in making decisions, poorness of memory, and generally run-down picture. 
Rather surprisingly, however, the APA Committee“) has no category listed under 
“‘Psychoneurotic Disorders’ which corresponds very closely to the old neurasthenia 
classification. Perhaps the nearest approximation to this in the APA classificatory 
scheme is a syndrome called ‘Inadequate Personality” listed under ‘‘Personality Pat- 
tern Disturbance.’”’ People exemplifying ‘Inadequate Personality” are said to show 
‘“{nadaptability, ineptness, poor judgment, lack of physical and emotional stamina, 
and social incompatability.’’“ »- *) In summary, we believe the most likely linkage 
between U.I. 23(—) and clinical-diagnostic classifications is with neurasthenia or 
inadequate personality.* However, it should be added immediately that the meaning- 
fulness of a factor does not depend on its fitting neatly and completely into old 
intuitive categories. Rather, we expect that the day will soon come when hypo- 
thetical pre-metric clinical categories will have to be readjusted to fit the precisely 
and empirically defined factor-categories actually found to characterize people. 


SUMMARY 


This paper deals with U.I. 23, one of the twenty factor-dimensions discovered 
among objective behavioral personality tests. Two related studies, described here, 
have succeeded in checking and improving the measurement of this dimension by 
modifying existing tests and developing new ones. One of the studies also correlated 
score on the factor with presence in the clinically-judged category ‘‘psychoneurotic”’ 
vs. normalcy. A significant relation was found but its low value and other related 

TLow factor B does load U. I. 23 highly in R8 and some of the U. I. 23 markers do look like ae 
mental ability tests, but the factor B (intelligence) correlation (table 2, #35) has not been con: 
in other studies, where both special and general mental ability tests failed to load U. I. 23. In five other 
studies, general intelligence measures averaged only .05 loading on U. I. 23), and in several cases, 
U. I. 23 and intelligence (U. I. 1) have eeucet together as distinct factors in the same study.“ 
Given the precision with which the U. I. 23 factor was estimated in R8, we expect that the high factor 
B relation was due to (a) unreliability of factor B, which is a very short intelligence measurement 
and/or (b) some special sampling effect. 

*The “neurasthenia” or “inadequate personality” hypotheses are also consistent with the low 
relationship of U. I. 23 to the clinical criterion in R6, on the reasonable assumption that our following 
of the categories in the APA manual reduced the number of and/or tendency to neurasthenia in the 
neurotics se 4 
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evidence suggests that U.I. 23 is only one of several dimensions along which neurotics 
and normals differ. Analysis of the nature of the dimension indicates that U.I. 23 
may have its highest relation with the traditional clinical neurotic category of neur- 
asthenia, or with the current conception of inadequate personality. It is also related 
to, but more restricted than, Eysenck’s neuroticism factor. 
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A FACTOR ANALYSIS OF THE PICTURE COMPLETION 
ITEMS OF THE WAIS!:? 


DAVID R. SAUNDERS 
Educational Testing Service, Princeton, N. J. 


INTRODUCTION 

It appears that there are at least as many statistically significant dimensions 
tapped by the WAIS as there are distinct subtests, but that the factor structure and 
subtest structure of the battery are not congruent®). Several of the significant 
factors apparently depend upon content domains overlapping more than one subtest, 
and the Picture Completion (PC) subtest, for one, is factorially complex. The efforts 
of previous investigators to place Wechsler PC items on a unidimensional scale of 
difficulty have yielded discordant results“: *: *), a fact which also points to a multi- 
dimensional underlying structure. The primary purpose of this study was to verify 
and describe the factorial complexity of PC by a factor analysis of its constituent 


items, in order to provide a basis for differential interpretation and keying of the 
separate dimensions. 


METHOD 

The sample of subjects was the same as that used previously“, comprising a 
combined group of 228 male college and college-preparatory students, Items 1 and 3 
were dropped from the analysis because each of them was failed by only two subjects 
in our sample. Item “PA’’ was added to the regular PC items, basing its scoring 
upon evidence of recognition of the fish in the basket in item 7 of Picture Arrange- 
ment. 

Tetrachoric correlations were computed by machine, and are reported in Table 


1. The factor analysis was carried out using an iterative procedure for communality 
estimation”. Stable results for three factors were obtained after four iterations. 
The latent roots for these factors are 3.475, 1.612, and 1.232, accounting for not quite 


Taste 2. Rotratep Facror Matrix 





Item I Ill 





h? 
—090 818 732 
317 238 
115 084 

353 261 

106 114 

006 754 
—051 693 
354 301 
—224 172 

058 

030 

655 
—054 
—001 

364 

291 

208 

181 

178 

193 


1.819 2.607 1.895 





1This research was supported by the Society for the Investigation of Human Ecol 


*This report may be regarded as the second in a series“) looking toward Objective Interpretation 
of the Wechsler as a personality measure. 
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one-third of the total variance of the 20 items. The communality estimates range 
from .058 (Item PA) to .754 (Item 8). Various rotational positions were explored; 
items 8, 14, and 16 were used to define the final rotation “. Fhe orthogonally rotated 
factor loadings and communalities are reported in Table 2. 


REsvU.tts* 


Only items 11 and 19 appear to represent appreciable violations of the positive 
manifold bounded by the planes of rotated factors, but both of these items are of 
comparatively low communality. The simple structure is therefore judged to be 
satisfactory. 

Factor I is most highly correlated with Item 18, which is the same item used by 
Rapaport? to illustrate what he termed “increase of distance from the picture’”’ or 
“impaired contact with reality” as one of three major sources of failure on PC items. 
By elimination (cf. below), Factor I may also be identified with Factor VI of the 
previous analysis of split-half subtest scores for this sample). 

Factor II is most highly correlated with items 8 and 9. The most common 
wrong responses to these items are “Bow” and “Oarsman,’’ respectively, and such 
responses may be judged to be psychologically similar to what Rapaport“) terms 
“loss of distance” as a second source of failure. ‘Maintenance of perspective’ ap- 
pears to provide a suitable rubric for the positive pole of this factor. 

Consideration of the fate of the odd- and even-numbered PC items in Table 2, 
in the light of the previous analysis ®), suggests that Factor II is very similar to the 
previous Factor IX. This leads to the prediction that certain Comprehension and 
Arithmetic items should correlate selectively with the items of Factor II. That such 
items can be found is illustrated by the values in Table 3. 


Tasie 3. C vs PC Irem INTERCORRELATIONS 
C #5 C #6 C #il C #13 


PC #18 —10 19 —30 -17 
16 03 —03 01 
12 05 —06 03 


PC #8 16 26 20 
9 21 06 
13 15 17 
PC # 2 —33 34 
14 —14 00 
10 16 12 











Factor III is most highly loaded by items 2, 10, and 14, which are precisely the 
three items used by Rapaport “) to illustrate failure in PC when a “query for inform- 
ation replaces concentration,” his remaining major source of failure. In the case of 
this factor it seems to us that awareness of uncertainty must be primarily involved, 
since subjects may suspect a correct answer without having the confidence to guess 
it overtly. 

Re-examination of Factor XI in the previous study of this sample® suggests 
that it may be very close to the present Factor III, and should not have been re- 
jected. If so, then the PC items correlating with Factor III should exhibit selective 
correlation with the Object Assembly (OA) items. Verification for this prediction 
may be found in Table 4. 

_ We have not yet isolated any items from any other subtest that correlate 
selectively with Factor I, and postulate that this factor embodies an unique contri- 
bution of PC to the Wechsler battery. 


%A more extensive discussion of these results may be found in ®). 
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Taste 4. OA vs PC INTERCORRELATIONS 
OA #1 OA #2 OA #3 OA #4 


PC #18 22 11 19 
16 15 13 10 
12 —20 00 00 


PC #8 —04 17 17 
9 —16 12 15 
13 29 39 13 


PC # 2 21 54 26 
14 22 11 31 
10 46 42 29 








She 


SSS Ss5 





SUMMARY 
The PC subtest of the WAIS is found to depend on three orthogonal factors: 
I. Maintenance of Contact; II. Maintenance of Perspective; and III. Effect of 
Uncertainty. 
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MEASURES OF OVER-CONTROLLED AND UNDER-CONTROLLED 
BEHAVIOR: A VALIDATION! ? 


CHARLES Y. NAKAMURA 
University of California, Los Angeles 


PROBLEM 

Two of the most frequently used objective measures of behavior in recent years 
have been the MMPI and the Taylor Manifest Anxiety Scale (TAS). A major prob- 
lem that confronted experimenters who used these instruments was the frequent 
finding that they were useful where interest lay only in establishing differences be- 
tween various groups of subjects but were not adequate for predicting the behavior 
of individuals. A clear demonstration of this is seen in the results of studies that 
employed the TAS “: ©) and the MMPI“: *: 7) to evaluate outcome of psychotherapy. 
The tests differentiated groups of patients judged improved from those judged un- 
improved, but they did not predict individual changes. The lack of discriminatory 
power may be attributed, in part, to the fact that there is considerable uncertainty 


1Read at the APA convention in Cincinnati, September 1959. 
*Supported in part by a grant from the research committee of the University of California. 
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as to what these tests measure. The TAS, for example, was initially devised to be a 
measure of anxiety drive level“ but its validity in that regard remains controver- 
sial*), Erickson and Davids®) suggested that the TAS is primarily an indicator of 
mode of responding to stress and that it measures overt as compared to inhibitory 
responses as ways of handling anxiety. This further suggests that it may be a reason- 
ably valid indicator of change of behavior in persons who express disturbance via 
clearly overt responses but is relatively insensitive to change in those who manage 
anxiety primarily through inhibition of activity. The purpose of this study was to 
establish the validity of some measures that would separately account for different 
modes of responding, specifically over-control and under-control, that remain un- 
differentiated in scores on scales such as the TAS. 

Block* has been doing work pertinent to this problem. He constructed scales 
designed to measure conceptually independent dimensions of behavior categorized 
as “‘neurotic over-control”’ (NOC) and “neurotic under-control’”’ (NUC). An espec- 
ially promising property of these scales was the nonsignificant correlation between 
the scores on them. Apparently, they measured different types of ineffective be- 
haviors. Persons who scored high on NOC were described by judges to be reluctant 
to enter into new experiences, selfconscious especially in social situations, overly 
responsive to other people’s evaluations rather than their own, and the like. On the 
other hand, high scorers on NUC were described as those whose heightened anxiety 
level is expressed through physical signs of tenseness, restlessness, embarrassment, 
and whose inability to control impulses becomes manifest in acting out, externalizing 
and even anti-social behavior. Although scores on NOC and NUC were orthogonal, 
since they were purported to be measures of maladjustive behavior, both would be 
expected to correlate positively with any measure of general psychological mal- 
adjustment insofar as both over-control and under-control are usually manifested 
by disturbed persons. Block found support for this in that both NOC and NUC 
correlated moderately with a genera! neuroticism scale (Pn) developed by himself. 


PROCEDURE 

An inventory was constructed comprising the Taylor Manifest Anxiety Scale, 
the K scale of the MMPI, filler items, and four scales developed and revised by Block 
to include only items from the California Psychological Inventory.‘ The latter four 
scales were NOC, NUC, Pn and Appropriate Over-control (OCA), a scale devised to 
measure relatively effective ways of behaving though in the over-controlled direction. 
The inventory was administered to groups of 88 men and 27 women from an intro- 
ductory psychology course at the University of Minnesota and readministered after 
an interval of three weeks. A shorter 176 item inventory® with fewer filler items was 
administered to 49 men and 34 women in a course in abnormal psychology, 44 men 
and 45 women applicants to the psychiatric division of the University Student 
Health Service at the University of California, Los Angeles, and to 50 men in high 
level business managerial positions in the Los Angeles area. 


REsvuLtTs AND Discussion 


The figures in the diagonal column of Table 1 show that the test-retest (three- 
week interval) reliability of each scale was quite high, ranging from .81 to .89. They 
are averaged correlations for the men and the women in the introductory psychology 
group. The inter-scale correlations for the women 8s are presented above the diag- 
onal and for the men 8s, below the diagonal. With few exceptions, the size of the 


‘Personal communication from Dr. Jack Block, t of ecneiey, yng of Cali- 


. His provision of data in advance of his report, in preparation, is gratefully ack- 
owledged. 
‘Copyright 1956, Rae rane Psychologists Press, Inc., Palo Alto, California. The items were 
used pom presented in this report with the permission of the publisher. 
‘The 7-page Inventory, including scoring and item identification has been deposited with the 
American Documentation Institute. Order No. 6085, remitting $1.25 for 35-mm. microfilm or $1.25 
for 6 x 8 in. photo-copies. 
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TaBLE 1. INTERCORRELATIONS BETWEEN ScaLEs FoR Four Groups or Sussects (CORRELATIONS 
FOR WoMEN ARE ABove, FoR MEN BzELow, THE D1AGoNAL) 
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Groups: 

ih Introductory Psychology, 1st admin.; 

1b) Introductory Psychology, 2nd admin.; 

3) Abnormal Psychology course; 

3) Applicants to psychiatric service; 
4) High level business personnel; 

Scales: 
NOC—Neurotic over-control (29 items) OCA—Over-control, appropriate (20 items) 
NUC—Neurotic under-control (25 items) | TAS—Taylor Manifest Anxiety (49 items) 
Pn—Psychoneurosis (19 items) K—K scale of MMPI (30 items) 


correlations between any pair of scales was quite stable over all groups of Ss. This 
demonstrated that the inter-scale correlations were consistent even thvugh the 
oom differed on such variables as age, sex, geographical location of residence and 
psychological adjustment. The near zero, nonsignificant correlations (p > .05) be- 
tween NOC and NUC for all groups supported the contention of independence of 
the scores on these scales. Although the correlation coefficients between the other 
scales are included in Table 1, the levels of statistical significance are not given here 
since the inter-correlations were computed for comparative purposes. Though in- 
formative they were not especially important for meeting the primary objective of 
this report, to establish the construct validities of the NOC and NUC scales and to 
show the potential usefulness of them for the assessment of effects of psychotherapy. 

As expected, NOC and NUC were correlated moderately highly with Pn, general 
neuroticism, and each accounted for part of the variance in the TAS scores with a 
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general tendency for a larger proportion to be accounted for by NUC than by NOC. 
This was true except for the reversals for the women in the introductory psychology 
group and for the managers. 

Table 2 shows the mean scores on the six scales for the different groups and the 
significance levels for the difference between the means of the psychiatric group and 
the corresponding mean scores for each of the introductory psychology (first admin- 
istration), the abnormal psychology, and the business personnel groups. The sig- 
nificantly higher means of the psychiatric group of men on NOC, NUC, Pn and TAS 


TasBie 2. Mpan Scate Scores ror Four Groups or SuBJEcTS AND SIGNIFICANT DIFFERENCES FROM 
THE PsycHIaTRIic GROUP 
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rvice Psychology Psychology Personnel 
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M sD M sD 
11.38* 11.39* 
9.82* 9.59** 
6.59** 6.35** 
14.04* 
12.69** 
17.47** 


(N = 34) 





a) 
o 


14.30 
1.51 


wn gogo ce cn 
&SSSBSS 
POwmuee 
Se25a8 
gosagngegem 
ASSSRz 





m 
o 


13.78 
12.09 
10.13 
11.98 
22.95 
13.90 


POwgonom 
TRBSS 





S88FSS 
gp ssgoqognm 
SFSASE 














*.01 level of significance. 
*,001 level of significance. 


provided support for the validity of the scales as measures of psychological disturb- 
ance. The results for the women were similar except for the nonsignificant differ- 
ences between means on NOC. TAS was one of the measures that most clearly 
differentiated the psychiatric from the other groups for both men and women Ss. 
However, in attempts to predict changes in individuals resulting from psycho- 
therapy, the combined use of the over-control and under-control scales may be the 
more meaningful, especially for patients who manage anxiety via inhibition. In these 
people, it would be expected that judged improvement would be paralleled by a de- 
crease in the measure of over-control even though little or no change may occur in 
under-control or scores on scales of general neuroticism or anxiety such as Pn, TAS, 
and some of the MMPI scales on which inhibited persons would likely not have 
scored particularly high at the beginning of therapy. 


SUMMARY 


An inventory comprising six scales was administered to four different groups of 
subjects to assess the stability of the inter-scale correlations over different samples. 
Of primary interest were two of these scales that were purported to measure “‘neu- 
rotic over-control’’ and “neurotic under-control” types of behaviors. The inter- 
scale correlations between these two and each ‘with the Taylor Manifest Anxiety 
Scale demonstrated that, as hypothesized, the scores on over-control and under- 
control though independent of each other correlated positively with the Taylor 
scale. The potential usefulness of the over-control and under-control scales, used in 
combination, to measure change resulting from psychotherapy was discussed. 
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A SCALE FOR SELF DESCRIPTION! 
AUSTIN E. GRIGG AND H. PAUL KELLEY 
The University of Texas 


INTRODUCTION 

Self-report techniques have been utilized to measure personality attributes 
since Woodworth’s“) Personality Inventory. The traditional self-report method- 
ology requires the § to respond to a series of statements by indicating whether they 
are true or false when applied to himself, and his responses are scored by a key de- 
veloped after an item analysis of the responses given by variously defined criterion 
groups. In some instances, keys have been derived by “expert’’ judgments about 
the implications of the direction of the response to the specific items. Another self- 
report technique, much less standardized and not as easily quantified, is the sentence 
completion technique, especially as developed by Rotter®’. Here the individual 
completes sentences from brief word stems under instructions to write his true feel- 
ings about the sentence stems. The responses may be categorized as reflecting ad- 
justment or maladjustment, but the chief use of the method seems to have been to 
obtain some reflection of the individual’s personality by means of a qualitative an- 
alysis of the sentences. 

The technique for assessing self descriptions with which this paper is concerned 
lies somewhere between the above two methods of self-report, and consists of having 
subjects complete three sentence stems by selecting adjectives to describe their 
feelings, their study or work habits, and their social reaction pattern. Three ad- 
jectives are selected from a pool of 12 adjectives to describe feelings; three adjectives 
from a different pool of 12 adjectives to describe the study and work habits, and three 
adjectives from a third pool of 12 adjectives to describe social reactions. The words 
making up each of the pools of adjectives have been scaled for adjustment value. 


1This research was sponsored by a research me awarded to the University of Texas Testing and 
Counseling Center by the Hogg Foundation for Mental Health. 
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SELF DESCRIPTION 


Below are some statements that we would like to have you complete about yourself. Fill in each blank 
with a word from the suggested list following each statement. For any given blank, you may select 
any word from the list. 


An exact word to fit you may not be in the list, but select the words that seem to fit most closely the 
way you are. 


Select any three words from this list: 


alert calm 
anxious cheerful 
apathetic despondent 


2. When I study or work, I seem to be 


Select any three words from this list: 

confident i rattlebrained 
i i rational 
sluggish 


disorganized 


Select any three words from this list: 


ntative defensive i ised 
wos $2 = P= 
confident friendly i temperamental 


THE SCALE 


In order to obtain scale values for the adjectives, clinical and counseling psy- 
chologists were asked to rank the adjectives in each pool as to the degree of adjust- 
ment involved. The 81 judges consisted of 70 diplomates in clinical or counseling 
psychology who responded by mail and 11 non-diplomate clinicians or counselors 
(with doctoral degrees) working at the State University of lowa Counseling Service, 
the University of Nebraska Counseling Center, or the University of Texas Testing 
and Counseling Center. 

The judges were given the three sets of 12 adjectives and asked to rank the ad- 
jectives within each set according to the degree of adjustment each adjective would 
indicate when used by college clients to describe themselves. These judgments of 
rank-orders provided the basis for deducing the proportions of times each adjective 
was perceived as indicating better adjustment than each other adjective. From the 
deduced proportions, scale values were obtained for the adjectives within each set by 
using what Torgerson“ refers to as the traditional solution for incomplete matrices 
for Condition C of Thurstone’s Law of Comparative Judgment. (This scaling pro- 
cedure is known more familiarly as the traditional procedure for Case V of the Law 
of Comparative Judgment). Table 1 shows the three lists of adjectives and their 
original scale values. 

An examination of Table 1 shows that for all three areas there are places within 
the scale where there are wide gaps between the scale values for adjacent adjectives. 
It is desirable to develop scales which are composed of adjectives more evenly spaced 
along the continuum of adjustment value. The wide gaps in scale values between 
adjacent adjectives are a reflection of a fairly small degree of overlap or ambiguity 
in the judgments regarding these adjectives. This in turn means that the scale values 
for certain adjectives, as determined by this method of scaling, are relatively un- 
reliable. It was felt, however, that sufficient reliability was obtained in order to test 
the hypothesis that scores determined from scales such as these would in fact be 
related to counseling variables. 





A SCALE FOR SELF DESCRIPTION 


Tasie 1. Onicmvat ScaLe VALUES OF THE ADJECTIVES* 





Feelings Study and Work Social Reations 


alert adaptable ar, ntative 
anxious — 494 | awkward 
apathetic confident 
calm secident defensive 
confused 


ndent 
ng efficient 
irritable ~~ 
nervous 
relaxed Elisbrained 
tense rational 
whiny sluggi 














*Obtained scale values have been multiplied by 100. To avoid negative scores, the 
lowest value was set to equal zero. No attempt was made to establish a rational 
zero point. 


With the assignment of scale values to the adjectives, it becomes possible to 
compute scores for each respondent in the three areas (feelings, study or work habits, 
and social reactions). The score for each area is obtained by summing the scale 
values of the three adjectives selected by the respondent in that particular area. 

While the scaling described above was being carried out, the Self Description 
sentences were administered to two groups of college undergraduates: 282 clients of 
the University of Texas Testing and Counseling Center who filled out the sentences 
prior to their first counseling interview, and 138 University of Texas students in two 
introductory psychology classes and one upperclass course in child psychology. The 
students’ responses were scored, and the obtained means and standard deviations 
are shown in Table 2. 


TaBLe 2. MBANS AND STANDARD DeEvIATIONS OF ORIGINAL SCALE-VALUE SELF 
Description ScorgEs oF CLIENTS AND Non-CLIENTS 


Scale Group Mean 


Feelings Clients 875.82 
Non-Clients 979.81 
Combined 909.99 


Study-Work Clients 977.72 
Non-Clients 1116.11 
Combined 1023.19 


Social Clients 628.80 
Non-Clients 138 669.50 
Combined 420 642.18 














BAS | B58 | S88) g 
agu| 2a5| Bes 





Since the ‘“‘raw score” units for all three scales are large and clumsy for practical 
clinical use, each score distribution for the total group of respondents was converted 
by a linear transformation to a scale which yielded a mean score of 10 and a standard 
deviation of 3 for the combined groups. These transformed scale scores, hereefter 
referred to as converted scores, were then rounded to the nearest whole number. 
Table 3 presents the equations ‘and conversion tables for making these transforma- 
tions. In addition to making the scores more practical for clinical use, these trans- 
formations render the three scales somewhat comparable in the sense that the scores 
indicate relative positions within the group. 

To estimate the reliability of the Self Description scores, a group of 64 students 
in introductory psychology were given the instrument twice, with a five week interval 
between administrations. The test-retest reliabilities after the five week interval for 
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TaBLE 3. TRANSFORMATION OF ORIGINAL SCALE-VALUE ScorEs TO CONVERTED 
RES* 








Feeling Study Social 
Raw Wtd | R Wtd | Raw 





220-307 4 
308-421 5 182-265 
422-534 6 266-349 
535-648 7 350-433 
649-761 8 434-517 
762-874 9 
875-987 10 
11 
12 
13 
14 


602-685 
686-763 
769-852 
853-936 
1474-1602 937-1021 


988-1101 
1102-1213 
1214-1326 
1327-1439 











*Feelings: C = .00884X + 1.95568 3 3 " 
Study: C = .00776X + 2.06138 Where C = () a= (em, -10) ;@ is standard 
Social: C = .01193X + 2.34010 — deviation; X is raw score; Mx is mean 
of distribution of raw scores. 


the three area scores, based on the 64 non-client sample, were found to be: Feelings, 
.345, Study and Work, .436, and Social Reactions, .682. Examination of the data 
indicated a marked restriction of the range of scores for the Feeling and the Study- 
Work areas, related, no doubt, to the use of normal non-clients in the test-retest 
study. When these correlations were corrected for restricted range, the corrected 
estimates of reliability were found to be: Feelings, .518, and Study-Work, .766. 
There was no justification for making a correction for restricted range with the Social 
Reactions data. 


Scores or Ciients vs. Non-Ciients 


In order to evaluate the ability of the three scales to discriminate well adjusted 
from less well adjusted college students, scores made by three groups of college 
students on the three areas of the instrument were compared. From the 282 clients 
of the University of Texas Testing and Counseling Center, two groups were drawn: 
50 clients presenting personal adjustment problems throughout their counseling, and 
157 clients with educational-vocational problems. (A residual group of clients was 
not used because of ambiguity in classifying their problem, or because their counsel- 
ing status was “in progress” and they had reported for fewer than three interviews 
at the time of the study.) The 138 undergraduate psychology students served as a 
non-client control group. Table 4 presents the converted score means and standard 
deviations of the three groups. From Table 4, it may be seen that the control group 
scored higher (better adjusted) than the two client groups in each of the three areas 


TasBLE 4. MEANS AND STANDARD DEVIATIONS OF CONVERTED SCORES 





Scale Group N 


Feelings Control 138 
Vocational 157 
Personal 50 


Study-Work Control 138 
Vocational 157 
Personal 50 


Social Control 138 
Vocational 157 
Personal 50 


x 
B 
~ 
o 





— 
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S23| 285 
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of the instrument. Table 5 presents results of testing for the significance of the 
differences between the groups. 

From Table 5, it may be seen that all three scales discriminated between per- 
sonal counseling clients and non-client controls. Both the Feelings and Social Re- 
actions scales discriminated the vocational clients from personal counseling clients, 


Tasie 5. Vauvuss or t ComputTep To Test TH SIGNIFICANCE OF DiFFERENCES BeTwEEN MEAN 
CONVERTED Scores oF CLIENT AND Non-CLIENtT Groups 





Feelings Scale Study-Work Scale Social Scale 
Control Vocational Personal Control Vocational Personal Control Vocational Personal 


Control 1.18 5.01** 2.62* 3.64** 0.81 3.75** 
Vocational 4.14** 1.76 3.45%* 


“Significant at the .01 level 
**Significant at the .001 level 








but not from the non-client controls, while the Work scale discriminated the voca- 
tional clients from the controls but not from personal clients. Thus the educational- 
vocational group differed from the controls only in the one area clearly related to 
their problems, and they did not differ in the other areas more clearly linked to 
“general personality’. These results seem to be logically consistent with Grigg and 
Thorpe’s®) study based on different client and non-client samples in which it was 
found that educational-vocational clients respond more like non-client controls 
than like personal counseling clients. 

To study changes in scores for the three areas (Feelings, Study-Work Habits, 
and Social Reactions) among the client groups from pre-counseling to post-couaseling 
testing, a study was made of the converted scores for the clients’ responses on these 
two occasions. This study was conducted on 28 personal counseling clients who took 
the Self Description instrument prior to counseling and again immediately after 
completing their series of counseling interviews, and on 83 educational-vocational 
clients who completed the instrument both pre- and post-counseling. The 64 non- 
client control cases initially utilized to obtain the test-retest reliability data reported 
earlier in this paper were also included. In order to check whether the mean scores 
in the three areas changed significantly during the interval of time between the test 
and retest administrations, the ¢ test for changes in correlated data was used. The 
results of this analysis are presented in Table 6. 


Tasie 6. Dirrerences Between INITIAL AND Seconp Testinc MBEAN CoNVERTED SCORES FOR 
Cirent AND Non-Cuent Groups 








Test Retest 
M SD M SD 


Z 





— 
92> 


eos es: : 
SSF | Se | Sak 


“99S 
BBs 





or m~po 
Bye | BEB 





* 


BBL | BBE | BBE 


4 
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**Significant beyond .01 level 
“Significant beyond .05 level 
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Both client groups displayed statistically significant improvement in the mean 
converted scores following counseling on the Feelings Scale, and the vocational group 
also improved significantly on the other two scales, whereas the control group did not 
exhibit any significant shift in converted mean score from the first to the second test- 
ing. This latter fact indicates that the significant improvement in mean scores after 
completion of counseling is probably not due simply to the unreliability of the meas- 
uring instrument. Whether or not such changes would occur in a client population 
which received no counseling is as yet undetermined, since this particular non-client 
group did not provide that kind of control. 


SUMMARY 

An instrument was constructed in which subjects are asked to complete short 
sentence stems on feelings, work and study habits, and social reactions by selecting 
three adjectives from a pool of 12 adjectives for each of the three areas. Clinical 
psychologists and counseling psychologists were asked to rank the adjectives in each 
pool as to the degree of adjustment the adjectives indicated. These judgments of 
rank-orders were used to obtain scale values for the adjectives within each set by 
Case V of Thurstone’s Law of Comparative Judgment. From the scale values of the 
adjectives, it is possible to compute scores for a respondent in three areas: feelings, 
study and work habits, and social reactions. The population on which the instrument 
was standardized consisted of 282 counseling clients and 138 non-clients, all uni- 
versity students. The instrument was able to discriminate clients from non-clients; 
also scores on the scales for the client groups changed significantly following therapy, 
but no significant changes occurred after a five week interval in a non-client control 
sample. 
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RESEARCH NOTE ON COLUMBIA MENTAL MATURITY SCALE (CMMS) 
AND REVISED STANFORD BINET (L) IN A PRESCHOOL POPULATION 


BORIS M. LEVINSON AND ZELICK BLOCK 
Psychological Center, Yeshiva University 


The Columbia Mental Maturity Scale (CMMS) has recently been revised 
with 17 new items substituted for old ones, the order of administration of most 
other items changed in order of difficulty, and with new norms established. This 
study is an evaluation of the Revised CMMS in a preschool setting. 


PROCEDURE 
Thirty-nine Revised Stanford Binet (L)® and CMMS records of children, 
aged from 4.0 to 5.9 years, and with test scores considered valid by the examiner, 
were selected for the study. The mean CA of the subjects was 4.83. The CMMS was 
administered first followed by the Binet with 11 subjects and the Binet first followed 
by the CMMS in 28 subjects. 





COLUMBIA MENTAL MATURITY SCALE AND STANFORD BINET 


Tas_p 1. Tae Means anp SDs or MAs anp [Qs oF Revisep SranrorpD Binzer 
(L) anp Cotumsp1a Mentat Martority Scape 








Tests N Mean Mean SD 
Revised Binet (L) 
Total 39 


Boys 17 
Girls 22 








Total 39 5.28 
Boys 17 5.27 
Girls 22 5.28 














RESULTS AND DISCUSSION 


Table 1 compares the MAs and IQs of CMMS and Revised Stanford Binet (L). 
In our population, the CMMS IQ is lower than that of the Binet 1Q and sex factors 
do not influence the scores. The correlations between CMMS and Binet MA and 
1Q were .45 and .39 respectively. 
On the basis of our experience with the test, the following factors should be con- 
sidered in the evaluation of CMMS results. 


(a) Practice effects. In cases in which the CMMS were administered first, 
the subject invariably passed items IV-6, V-3, and VI, 5 (Pictorial Likenesses 
and Differences) on the Revised Binet; in 28 cases where the Binet came first, 
7 children failed to pass VI, 5. 

(b) Perseveration. The correct answer appears consecutively as follows: 
once, 4 times; once, three times; and 21 times twice. A child who perseverated 
on an answer was either penalized or rewarded depending upon whether he 


initially scored correctly. Personality variables may thus account for chance 
success or failures. 


(c) Rigidity in scoring. A child is not credited for indicating a correct 
response even though he knows it and names it, as long as he doesn’t point it out. 
This serves to penalize some children who insist upon naming the object rather 
than pointing it out. 

(d) Order of difficulty. The items did not appear to be arranged in order of 
difficulty. Thus, card 26 had more failures than cards 27 through 36. Card 46 
was more difficult than cards 47 through 51. 


SUMMARY 
The Revised Stanford Binet L, and Revised Columbia Mental Maturity Scale 
were administered to 39 preschool children. The correlation between CMMS and 


Binet MA and IQ were .45 and .39 respectively. It appears that further revision in 
item arrangement, administration, and scoring are indicated. 
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STIMULUS VALUE OF RORSCHACH INKBLOTS AS MEASURED BY 
THE SEMANTIC DIFFERENTIAL! 


MELVIN ZAX AND ROBERT H. LOISELLE 
The University of Rochester 


PROBLEM 


The interpretation of responses to the Rorschach inkblots depends in part upon 
a knowledge of the stimulus value of the blots themselves. Those who have used the 
test extensively have speculated about the meanings many of the cards have for most 
Ss. Bochner and Halpern®? devoted a chapter to such theorizing and Phillips and 
Smith” have acknowledged this tendency to interpret on the basis of the signi- 
ficance of specific cards. 

A number of studies have been stimulated by such clinical hunches with the 
object of establishing a controlled empirical basis for these notions. Many of the 
studies have been devoted primarily to determining the existence of a ‘‘mother” 
card and a “father” card. Meer and Singer“) found that Cards II and IV were 
selected to a significantly high degree as the “father’’ cards while VII and X were 
most often selected as the ‘‘mother” cards. The findings in a study by Rosen “® were 
consistent with those of Meer and Singer with regard to cards IV and VII and in 
addition card VI was associated with the male sex organ and card X was seen both 
as the family symbol and as suggesting emotional insecurity. Recently Levy had 
27 Ss (second graders) match each card with one doll from a set of five which de- 
picted a man, a woman, a boy, a girl and a baby. He found that cards IV and VI 
were matched with the man significantly often while card IX, and not VII, was 
matched with the woman and girl a significantly large number of times. Richards ® 
reviewed some of these studies and a number of others and rank ordered the cards 
for preference, reaction time, productivity and card rejection based on a combination 
of the findings of others. 

The sexual significance of the blots was studied by Shaw®? who found that in 
the case of all of the cards but VI more female than male responses were made. 
Pascal, Reusch, Devine and Suttell “) confirmed these findings using both males and 
females as Ss. They also found that both sexes agreed upon the “femaleness” of 
cards I, IV, VII, VIII and IX. Cards I, III, II and VI were found to be the most 
sexually suggestive. 

While these studies have established some gross relationships, they have not 
delved into many of the components which go toward making up such relationships. 
Neither have they systematically determined the way in which age and sex differ- 
ences are related to the stimulus value of the blot. The Ly om of this study is to 
examine such relationships for a particular age group and to investigate differences 
between cards along a wider series of dimensions using the Semantic Differential. © 


METHOD 


Ss for this study were 40 male and 40 female undergraduate students at the 
University of Rochester, none familiar with the Rorschach blots. The mean age for 
the males was 19.97, range 17 years 11 months to 23 years 1 month. For the females 
the mean age was 19.62, range 18 years 3 months to 20 years 9 months. 

Seven Semantic Differential ©) scales were selected from each of the three factors 
on the basis of their factor loadings and also on their judged relevance to the ink- 
blots. Thus a scale like ‘‘good-bad” which has a high evaluative factor loading was 
used but “sweet-sour” which also has a high loading for the same factor was not 
used. These scales were randomly arranged in 10 different orders and booklets were 
assembled in such a way that the 10 orders fell in random positions in order to control 
for any order effects operating in the successive ratings. The particular scales used 
are listed in Table 1. 


1The authors are grateful to R. F. Green of the University of Rochester for his advice regarding 
the statistical handling of the data and to F. G. Benham and J. K. Woodward for their assistance in 
preparing and processing the data through the IBM 650 computer at the University of Rochester. 
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Procedurally, Ss were given booklets containing the standard Semantic Differ- 
ential instructions ©) with only minor modifications to account for the fact that ink- 
blots and not verbal concepts were being rated. After reading the instructions and 
being given an opportunity for questions (there were very few), each S was seated in 
an individual testing booth with a standard set of inkblots and made the 21 ratings 
for each card taken in the usual order from I to X. The time required to complete 
the series of ratings ranged from approximately 20 minutes to 50 minutes. 


RESULTS 


Ratings for each card on each of the 21 scales were tabulated separately for 
males and females. Initially, an analysis of the directional trends of the ratings was 
made. It was determined whether the ratings on a given card fell nearest the left or 
the right end of the continuum in a significantly high number of cases. Ratings under 
four, of course, were closest to the left end while ratings over four were closest to the 
right end. Ratings of four presented a special problem since on the Semantic Differ- 
ential they represent either a neutral rating with respect to the particular scale or 
the feeling that the scale is totally irrelevant. Since there is no way to distinguish 
between these possibilities, all ratings of four were eliminated in the chi square 
analyses. Observed frequencies were compared to the theoretical probability that, 
by chance, half of the ratings would be in one direction and half in the other. It 
should be noted that in most cases, when the chi square for a particular scale was 
significant, relatively few ratings of four were made. The results of the chi square 
analyses are reported in Table 1.” 

Comparisons were made between cards by applying the analysis of variance for 
repeated measurements of the same Ss as described by Edwards”? to each of the 21 
scales. Since the assumption of the normality of the distribution of ratings on each 
of the scales is tenuous (the scales being limited to seven steps), the acceptable 


probability level was set at one per cent. The results of these analyses indicate tha, 


TaBLe 1. Scates YIELDING SIGNIFICANT Cur SQUARES 
I II III IV Vv VI VII L 2: Ge x 


beautiful—ugly mf? mf F MF 
clean—dirty mf F MF F 
fair—unfair 

good—bad mf 
pe ioe 

kind—cruel 

wise—foolish* 
brave—cowardly 
hard—soft 

heavy—light 

large—small 
masculine—feminine 
rough—smooth 
strong—weak 
active—passive 
angular—rounded M 

fast—slow F MF MF 
hot—cold MF M 
reckless—cautious F M 
sharp—dull M MF F MF 
tense—relaxed F 








BBE 


_ 


MF 
M 


SB SB 


= 
sj 





1m’? denotes that a significant number of the ratings made by males were toward the right side 
of the continuum. “f’’ denotes the same thing for females. 

2M’ denotes that a significant number of the ratings made by males were toward the left side of 
the continuum. ‘‘F’”’ denotes the same thing for females. 


*Tables listing the distributions of ratings for each card, the F ratios and probability levels for 
analyses of variance between cards, and mean ratings for males and females on al! scales have been 
deposited with the American Documentation Institute. Order Document No. 6081, remitting $1.25 
for 35-mm, microfilm or $1.25 for 6 by 8 in. photocopies. 
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for 19 of the 21 scales there are differences between cards which are significant at the 
one per cent level (differences between cards for the scales ‘‘brave—cowardly” and 
“reckless—cautious” failed to reach statistical significance). In four cases these 
differences may be attributable to significant interactions between the cards and the 
sex of the rater (significant interactions were found in scales ‘‘clean—dirty”’, “happy— 
sad”, “hot—cold”, and “sharp—dull’”’) but in making ratings ont he vast majority 
of scales Ss tend to perceive characteristic differences in the cards (see footnote 2). 


Discussion 


The’results of the analyses of variance of the scales (see footnote 2) indicate that, 
as measured by the semantic differential, each Rorschach inkblot tends to be viewed 
distinctively. The fact that differences between sexes were significant in the case of 
only one scale suggests that, for the population sample, sex differences do not play a 
major role in the way the stimulus is perceived. A consideration of the mean ratings 
for each scale on each card and the distributions of these ratings suggest a number of 
conclusions regarding the impact of the blots on the Ss sampled. 


Carp 1. It is somewhat surprising and very interesting to note the degree of negative 
valence attributed to this blot (ugly, ry bse sad, cruel). In fact it is second only to Card IV 
on this score. In addition it is viewed as being a hard, large, strong, masculine stimulus which is 
also highly active, an , fast, sharp and tense. These findings would seem to contradict the 
view of Bochner and Halpern“) that this blot is “conducive to relaxed feelings’. It may be that 
the generally negative and threatening impression that the blot induces is a function of its position 
as the first of the series. This is a question for future empirical study. 


Carp II. While the tendency is to evaluate this blot mares, this is not done consistently 
by enough Ss to result in significant chi squares. Regarding t tency factor, the tendency is to 
characterize it as hard, large and strong but at the same time light and feminine. Both sexes view 
it as being high in the activity factor altho females tend to attribute to it more of the 
qualities associated with this factor than do , 
Carp III. This is the first of the blots to be endowed with positive qualities and the tend- 
ency is to view it as being clean, fair, good and happy. At the same time it is viewed as low in 
tency, being considered light, small and feminine. In fact for males it is viewed as the most 
wr of the blots. This has considerable relevance for the clinical use of the test in which the 
failure of the male S to respond to the human-like figures on this blot as men is sometimes con- 
sidered suggestive of identification problems. This card is also consistently seen to be high in 
most of the activity scales (i.e., it is seen by both sexes as active, fast, hot and sharp). 


Carp IV. This is one of the cards about which many clinical hunches have been generated. 
The results of this study would suggest that it is indeed a blot which has a definite and consistent 
meaning for Ss. It is evaluated the most negatively of all the blots being characterized — 
ratings as ugly, dirty, unfair, bad, sad, cruel and foolish. At the same time it is seen as being 
heavy, large, rough, strong and masculine, thus being very high in potency. There is little con- 
sistency in the ratings of the activity scales except that most Ss agree in the impression that it is 
slow. 

Many of the qualities attributed to this blot by our Ss appear to be consistent with the clin- 
ical impression that this card is the “father” card, its usual designation in the clinical use of the 
test. The reported failure to find ratings for card IV which were similar to those of the concept 
‘father’ ©) may well have resulted from the candor with which the two ratings are made. There 
is no social stereotype for Card IV while one definitely exists for ‘father’. 


Carp V. This card is generally evaluated rather neutrally. It is seen somewhat consistently 
as lacking in potency. Among the activity scales it is seen as active, fast and sharp. 


Carp VI. The tendency is to evaluate this on the negative side but there is no high degree 
of consistency on this score. The tendency to see it as a — ve is more marked with both 
males and females consistently viewing it as heavy and large. Likewise both see it as brave, 
masculine, and strong although only the females agree enough to result in a significant chi square. 
Among the activity scales it is generally looked — as among the slowest, coldest, dullest, most 
rounded, cautious, relaxed and passive of the blots but the ratings are not made consistently 
enough in this direction to result in significant chi squares. 


Carp VII. Although not resulting in many significant chi squares, ratings on the evaluation 
scales are in the direction of the more positive qualities. On the potency scales this blot is seen 
consistently as being soft, light, small and the most feminine of all the blots. There is also a 
tendency to see it as being active and sharp, and females consistently look upon it as being fast 
and hot. This card is another which has been the object of much speculation and these findings 
seem | regs information to many of the clinical intuitions which have been built up through the 
use of the test. 





STIMULUS VALUE OF RORSCHACH AS MEASURED BY SEMANTIC DIFFERENTIAL 163 


Carp VIII. This card is generall lly evaluated paints however the required consistency 
for a significant chi square was achieved on only two scales: clean—dirty and wise—foolish. The 
tendency is to see it as somewhat high in potency with both sexes seeing it as brave and females 
alone seeing it as masculine and strong. Among the activity scales both sexes saw it consistently 
as being active while females also saw it as being sharp and tense. 


Carp IX. The tendency is to evaluate this card negatively but there is enough incon- 
sistency to render all but one of the chi squares insignificant. As to potency it is seen as large by 
both sexes, soft by males and strong by females. Regarding the activity factor, it is seen as te | 
hot and sharp by females and rounded by males with the general trend for females being tow 
the more active ends of the continua and for males toward the more passive ends. 


Carp X. Of all the blots this one along with card III tends to be evaluated most positively. 
Both sexes consistently consider it to be beautiful and happy while females see it as clean and 
males view it as wise. It is considered to be low in potency with both sexes —s upon it as 
light and feminine. For both sexes this is one of the most feminine of the blots. pe f of the 
qualities which comprise the positive side of the activity factors are attributed to this so that 
it i emg by both sexes as active, fast and sharp. Males also consistently see it as angular, hot and 
reckless. 


Rabin’s “) study utilizing the semantic differential with the Rorschach inkblots 
appeared in print while the present paper was being written. His study differs from 
the present one in that his Ss made ratings of the projected images of the blots and 
were limited in the amount of time they had to check items. There was also some 
difference in the scales used. Most notably he failed to use the masculine-feminine 
scale. Comparing the 12 scales which were used in common, the present study 
shows strikingly similar results with regard to Cards I, III, IV, V, VII and X. In 
some cases on the other cards one study showed significant differences where the 
other revealed only suggestive trends. The general impression is that the findings in 
both studies are in close agreement. 

A major conclusion from this study is that the semantic differential is a useful 
device for developing a body of understanding of the stimulus value that the blots 
hold for S. A number of variables, among them age, sex, cultural background and 
emotional status of S may well be important in determining the impact of the blots. 
These are deserving of future investigations. 


SuMMARY 

Forty male and 40 female college students rated the standard Rorschach ink- 
blots on 21 of the semantic differential scales; seven from the evaluative factor, seven 
from the potency factor, and seven from the activity factor. Ratings were analyzed 
both for directional trends within cards through the use of the chi square technique 
and for differences between cards by the analysis of variance technique. Each blot 
was discussed in relation to the findings relevant to it. Many of the clinical im- 
pressions of the cards, particularly cards 1V and VII, seemed to have been confirmed. 
In addition card I was viewed as being highly potent, active and was evaluated 
extremely negatively which was unexpected. 
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RORSCHACH GENETIC LEVEL AND PSYCHOTIC SYMPTOMATOLOGY! 
DAVID LEVINE 
University of Nebraska 


PROBLEM 


Several investigators ®: *: * 4) have demonstrated independently that Rorschach 
responses can be scored on a continuum of “‘psychological maturity” and that this 
dimension is related to the nature and outcome of serious mental disorder. This 
research rests on a theoretical foundation which integrates Heinz Werner’s develop- 
mental theory and psychoanalytic Ego psychology. The major tenets of this posi- 
tion“*® are first ‘that wherever development occurs, it proceeds from a state of 
relative globality and lack of differentiation to a state of increasing differentiation, 
articulation, and hierarchic integration,”’ and secondly, that much deviant human 
behavior can be explained in terms of the psychoanalytic concepts of regression and 
fixation. 

Specifically, the data have indicated that “‘process-like schizophrenics obtain 
lower Rorschach genetic-level scores . .. than . . . reactive-like schizophrenics” ®) and 
that, in schizophrenic patients, Rorschach genetic-level scores are positively cor- 
related with good hospital adjustment and active social participation“. It has also 
been found that schizophrenics who are discharged from the hospital within a year 
after their admission are likely to have higher Rorschach genetic-level scores than 
those who remain hospitalized. “ 

The present study is the result of dissatisfaction with the vagueness, as an 
explanatory concept, of the term “psychological maturity’. It is generally agreed 
that the schizophrenic patient, in most respects, is markedly different from the 
young child and that he has ‘“‘regressed in only certain respects’’.“ ”) An explora- 
tion of the specific variables which are related to the phenomenon of “‘regression’’ 


seems desirable. A first step in this direction can be achieved by comparing the 
Rorschach genetic-level scores of patients with different symptoms. For example, do 
patients who manifest the markedly regressive symptoms of bizarre postures or 
hallucinations have lower Rorschach genetic-level scores than patients who do not 
show these particular symptoms? 


METHOD 


The first five male patients admitted each week to a VA neuropsychiatric 
hospital with diagnoses of functional psychosis not in re:nission were selected for the 
study until a sample of 120 subjects had been reached. No patient was included who 
had been a psychiatric patient in any hospital for 90 days or longer during the pre- 
ceding six months. Within three days of admission each patient was interviewed 
by the author. After the interview, the Symptom Rating Sheet (SRS) devised by 
the staff of the Veterans Administration Psychiatric Evaluation Project was com- 
pleted.2 The SRS, made up of twenty symptom scales (see Table 1), furnishes an 
objective description of psychotic behavior. Jenkins, Stauffacher and Hester“ 
write, “for purposes of estimating the reliability of the ratings a cutting point was 
established for each scale and ratings below the cutting point were classified low, 
while ratings above the cutting point were classified as high ratings.’”’ These same 
cutting points were used in the present study. Thus, patients who were rated above 
the pre-determined cutting point were assumed to manifest the symptom in question, 


1This study was started as an Individual Hospital Project of the Veterans Administration Psy- 
chiatric Evaluation Project directed by Richard L. Jenkins. It was carried to completion with support 
furnished by the Research Council, University of Nebraska. The Rorschach scoring was carried out by 
a and Kenneth Stewart. Much of the statistical computation was accomplished by 

onald e. 

2For further information concerning the SRS or other of the Veterans Administration 
Psychiatric Evaluation Project, contact: Richard L. Jenkins, M. D., Veterans Administration Hos- 
pital, 2650 Wisconsin Avenue, M. W., Washington 7, D.C. 
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while patients who were rated below the cutting point were assumed not to manifest 
this symptom. 

Within three weeks of admission each patient was given the Rorschach by one of 
six psychologists: five VA staff psychologists and an advanced trainee. The proto- 
cols were scored independently by two advanced graduate students. Patients who 
gave protocols of ten or fewer responses were excluded from the study. In order to 
check the reliability of the scoring, the Rorschachs of twenty patients were scored 
separately by both judges. After the Rorschachs were scored, each response was 
assigned a genetic-level score based on Becker’s? criteria. For his system, Becker, 
following Werner “*) and Friedman), constructed a six point scale on which each 
Rorschach response could be located on the basis of its genetic level. For example, a 
response to the entire card in a global, undifferentiated manner (an amorphous 
Whole response: Wa) which is most commonly found in four year old children“ 
would receive a genetic-level score of “1”. Illustrative of this kind of Wa response are 
“black paint’’ to Card I or ‘‘Fire and Smoke” to Card Il. DW, W—, and Contam- 
inated Responses are also scored ‘‘1’’. Amorphous responses to Usual Details (Da) 
receive a scale score of ‘‘2’’ since they reflect the attempted differentiation of the 
stimulus into parts. Vague responses to the Whole blot (maps, designs, etc.) are 
scored ‘‘3’’, because they reflect some “‘integrative effort with consideration of the 
formal aspects of the blot.” Popular responses or responses in which the shape of the 
image generally conforms to the blot area are level ‘4’. When two or more areas are 
articulated with ‘‘good form’’ a score of ‘‘5’’ is assigned; whereas, when an unbroken 
blot (I, 1V, V, VI, 1X) is ‘‘perceptually articulated and reintegrated into a good form 
percept,” the highest genetic level score of ‘‘6’’ is attained. More detailed discussion 
of this and closely related scoring systems are available in the literature. @ 5 1°) 

After the separate responses were each assigned a genetic-level score, a mean 
Rorschach genetic-level score was obtained for each subject by adding the scores and 
dividing by the total number of responses. The reliability coefficient of the mean 
Rorschach genetic-level scores obtained independently by the two judges was .87. 
The mean Rorschach genetic-level scores of patients who manifested a symptom 
were then compared with the scores of patients who did not manifest that symptom. 


TasLe 1. Megan Rorscuacn Genetic Levent Scorgs oF PaATrENTs wHO MANIFESTED SYMPTOM AS 
CoMPARED WITH Patrents WHo Dm Not Manirsst THE Symptom 
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Patients who were judged “unratable’’ on any symptom were excluded from the 
comparison. The statistical procedure followed is that recommended by McNemar 
(®, p. 223) for a comparison of the difference between uncorrelated means. A one-tailed 
“t’”’ test was employed to arrive at the level of significance since the general theoreti- 
cal position states that patients who are more seriously disturbed will have lower 
Rorschach genetic-level scores. 


RESULTS 

The means of the mean Rorschach genetic-level scores for the patients who mani- 
fested a symptom and for the patients who did not manifest that symptom are com- 
pared in Table 1. In only two of the twenty comparisons does the difference between 
means reach the five percent level of significance. However, the distribution of ob- 
tained “t’s” departs significantly from chance (P < .05) when the distribution of 
“‘t’s”’ is grouped into three intervals (cutting points chosen were at the P of .20) and 
the Chi Square test is applied. 

These findings tentatively suggest that patients with symptoms of apathy, lack 
of motivation, and lack of goal-directedness will tend to have lower Rorschach 
genetic-level scores than patients without these symptoms. On the other hand, for 
those symptoms which are generally regarded as “‘regressive” in nature, such as dis- 
organization of thinking and bizarre postures, the differences in Rorschach genetic 
level do not appear. Hallucinations, in fact, are more likely to be found in patients 
with high genetic-level scores. 

Thus, those symptoms which would be expected, according to the present 
theoretical formulation, to reflect most clearly this difference are empirically un- 
related to Rorschach genetic-level. Rather the obtained relationships suggest the 
Adlerian concepts of “social interest”’, ‘‘goal’”’, and “‘life-style”. Adler’s®: ». *) funda- 
mental proposition is that “every psychic phenomenon .. . can only be grasped and 
understood if regarded as a preparation for some goal.’’ He writes, further, that the 
psychotic’s ‘‘.... goal of success is a goal of personal superiority, and their triumphs 
have meaning only to themselves . . . the meaning they give to life is a private mean- 
ing .. .’’@- ». 56) The finding of a relation between a measure of “severity of mental 
disorder” and a lack of social interest or goal-directedness is, then, predictable from 
an Adlerian point of view. Those schizophrenics who have “given up’’, who are 
apathetic, who have no plans—they are the ones least likely to improve. Hallucina- 
tions, panic, bizarre ideas, and disorganization of the thought processes may repre- 
sent part of the struggle to re-establish a socially meaningful goal in life. 

Although one may take the position that apathy represents the most infantile 
attitude and therefore conclude that the most regressed schizophrenics would be 
expected to appear apathetic, the Rorschach literature presents us with a more 
direct, economical, and, on the face of it, more reasonable explanation. By making 
the straightforward assumption that the approach to the blots parallels the approach 
to life,“ it is possible to explain the empirical relationships found in the present 
study. The person who puts forth the energy to organize the blots in a meaningful 
fashion, and thereby achieves a higher Rorschach genetic-level score, also puts forth 
the effort to organize and plan his life; if a patient in a mental hospital, he will not 
accept his hospitalization passively, but will struggle to overcome those difficulties 
which he faces. The patient who “lacks goals” and appears apathetic will approach 
the blots with little drive and be satisfied with vague or poorly organized percepts. 


SUMMARY 
The present study suggests that the so-called Rorschach ‘“genetic-level’”’ score 
is an important psychological variable. The findings, however, cast doubt on the 
relation of the score to the underlying theory. Whereas a significant relationship be- 
tween psychotic symptoms of apathy and low Rorschach “‘genetic-level”’ score was 
obtained and, whereas there appeared to be no empirical relation between ‘re- 
gressive’ symptoms and Rorschach genetic-level score, this score might be as aptly 
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termed a Rorschach “energy-level’’ or “planning” score. Whatever terms are em- 
ployed, however, it would seem more direct, more economical and more reasonable to 
interpret the relevant empirical data within Adler’s theory of personality rather than 
within a theoretical framework which combines Werner’s developmental psychology 
and contemporary psychoanalytic theory. 
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VERBAL AND NONVERBAL REINFORCERS IN THE RORSCHACH 
SITUATION 


M. G. MAGNUSSEN 


Pet 5 Nuclear Propulsion Department 
General Electric Company, Cincinnati, Ohio 


PROBLEM 

Schafer“) has emphasized the interpersonal and dynamic interactions between 
the examiner (E) and the subject (S) in the testing situation. Some research @: ®) has 
been reported concerning the problem of just how and to what extent these inter- 
personal relations and examiner differences ® have affected test results. The present 
investigation was designed to test the general hypothesis that the examiner-subject 
(E-S) interaction is an important variable in evaluating test results. More spec- 
ifically, this study attempted to explore whether different cues given by the E are 
responded to without awareness by the S and eventuate in motivating or altering 
Rorschach productivity. 


METHOD 


The Ss were thirty-three males randomly selected from candidates being seen 
for personnel evaluations at the General Electric Company at Cincinnati, Ohio. The 
following criteria were made prior to inclusion into the study: (a) no prior Rorschach 
testing, (b) a minimum of a bachelor degree from an accredited university, (c) no 
history of psychiatric treatment or evidence of brain damage, and (d) no less than 
two responses per card for all ten Rorschach blots. 
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Intellectual level was assumed to have been approximately equated by virtue 
of excluding Ss without college degrees from accredited universities. In addition all 
Ss used in the study were engineers and the mean level of education was five years of 
college with no significant differences in education level between the three groups. 
The mean age of all Ss was 32, range of 23-40. 

The Ss were individually administered the Rorschach cards under the standard 
procedure utilizing Beck’s“? instructions. The Beck norms of popular responses 
were used in the scoring of the Rorschach protocols and were the associations that 
the E reinforced under two of the conditions. Under the verbal reinforcement condi- 
tion (VR), the E replied, ‘‘Uh huh,” after each popular response. In the nonverbal 
reinforcement condition (NVR), the E nodded his head once after each popular 
response. Under the control condition (C) the test was administered with a con- 
scientious attempt not to provide any cues or reinforcements. A post-examination 
interview was conducted with each § to ascertain his degree of awareness of the study. 
None of the Ss verbalized any knowledge of the purpose or nature of the investiga- 
tion. 


REsvuLtTs 
Comparisons of the mean number of popular responses were made under the 
three conditions and revealed heterogeneity of variance. This finding necessitated 
converting raw scores of each S to square root transformations. Table 1 shows the 
analysis of variance table based on the means and variances of the raw data and the 


Tasie 1. ANALYSIS oF VARIANCE oF THE SQuaRE Root TRANSFORMATION 
Scorges UNDER THE THREE CONDITIONS 











Square Root 
Raw Scores Transformation 
Condition Mean Variance Mean Variance 


VR (N = 12) 9.61 12.01 3.61 42 
NVR (N = 10) 10.91 25.98 4.00 51 
C(N = 11) 7.10 6.99 2.98 41 








square root converted scores. The analysis of variance of the square root transforma- 
tions for the effect of reinforcement (VR and NVR) yielded an F significant at the 
.05 level. The three conditions were compared against each other by individual t 
tests with the results that the VR condition gave significantly more popular responses 
than the C condition at the .05 level of significance. The NVR condition elicited 
more popular responses than the C condition at the .05 level of confidence; and, 
finally, the VR and NVR conditions did not show a statistically significant differ- 
ence. The results reveal that the verbal reinforcement (‘Uh huh’) and the non- 
verbal reinforcement (head nodding) increase the frequency of the preselected popular 
responses in a Rorschach test situation. In addition, the findings suggest that the 
verbal stimulus is no more effective than the nonverbal stimulus in terms of increasing 
the reinforced response. These results confirm the findings of Gross’s °) independent 
study reporting on psychiatric patients in which the three conditions conducted re- 
sulted in similar findings of both reinforcement groups increasing human content 
productivity when compared with a control group, but not producing significant 
differences in terms of whether the reinforcer was verbal or nonverbal. 


SUMMARY 


This study tested the effect of E behavior in the Rorschach situation. The re- 
search was designed to explore the results of verbal (VR) and nonverbal (NVR) 
reinforcement of Beck’s list of popular responses on the Rorschach. Thirty-three male 
engineers at General Electric’s Evendale, Ohio plant were randomly selected and 
administered the Rorschach under three different conditions. The VR condition was 
merely the statement by E of “Uh huh,” the NVR condition was a single head nod 
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by E, and the control condition C was no reinforcement of the free associations of 
popular responses. The results revealed that both the VR and NVR conditions 
yielded significantly more of the reinforced popular responses than the C condition 
and that no significant differences existed between the VR and NVR conditions. The 
importance of E awareness of his behavior and its influence upon the E-S interaction 
and test interpretation was discussed. 
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FIGURE LOCATION IN STUDENT AND PATIENT SAMPLES 
ROBERT E. TAYLOR! 
University of Tennessee 


PROBLEM 
Clinicians frequently attribute projective significance to the page location of 
drawings. Machover“) interprets left placement as self orientated, while the right 
of the page signifies the environment. The top is related to optimism, and the lower 
portion to depression. Should these hypotheses hold, deviant placement might be 
anticipated in a hospital population. This study compares the placement of geo- 
metric figures on a blank page by student and clinical groups. 


MeEtTHOD 


Subjects for this study were 50 patients from the Veterans Administration 
Hospital in Tuscaloosa, Alabama and 47 students in an introductory psychology 
class at the University of Tennessee. The experimental task was administered to 
patients of mixed diagnoses following a routine screening battery. Students were 
tested in the classroom prior to regularly scheduled lectures. Subjects were in- 
formed that the task was a current research project and did not constitute a part of 
their individual test batteries or classroom work. 

Subjects were provided with a pencil and blank paper positioned vertically in 
front of them. They were not allowed to write their names or other information on 
the paper before the test was administered. Instructions were as follows: (a) draw a 
horizontal line, (b) draw a vertical line, (c) draw a circle, (d) draw a square, (e) draw 
a triangle. Every subject was allowed to complete each portion of the task before E 
proceeded with the next part. All drawings were restricted to one side on a single 
sheet of 844 x 11 inch white bond paper. Questions during administration were 
—— with a repetition of the instructions and /or comments to “draw it any way 
you like’. 


1The author wishes to express his gratitude to staff and trainees of the Psychology Service, Vet- 
erans Administration Hospital, Tuscaloosa, Alabama for their aid in obtaining a portion of the data, 
and to H. C. Rickard, in particular, for his encouragement in the preparation of the manuscript. 
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Results were quantified by lining a transparent sheet into quadrants and placing 
this over individual productions. A tabulation was made of horizontal line placement 
with respect to location above or below midline of the page. Vertical line placement 
was similarly recorded for the left-right dimension. Individual geometric figures 
were assigned to one of the four quadrants on the basis of the quadrant containing 
the major figure area. 


RESULTS 


Table 1 presents the data showing student and patient usage of page area in 
placement of horizontal and vertical lines. It is evident from inspection of the results 


TaBiE 1. NuMBER OF PaTIENT AND StupDENT Sussects Piactne HorizonTAL 
AND VerticaL Lives In Pace AREA INDICATED 








Horizontal Vertical 
Groups Top Midline Bottom Left Mildine Right 


Patients 45 0 5 39 2 9 
Students 45 1 1 34 4 9 














that the two groups do not differ significantly. Chi-square probabilities are greater 
than .2 and .7 for horizontal and vertical lines respectively. The preference for the 
top half of the page is more marked than the definite left vertical preference in both 
groups. A further breakdown of the data in terms of distance from midpoint on the 
page shows no significant difference between the groups. The horizontal line, when 
placed at the top, was more distant from the midpoint than was the vertical line 
oe gag to the left. This is consistent with the absolute frequencies shown in 
the table. 


Taste 2. Noumsper or Patrent anp Stupent Susyects Piracina Major 
Portion oF Grometric FicurEs In QuapRANT INDICATED 


Up Lower Upper Lower 

4 io i - ie 
Patients 33 3 12 
Students 17 12 10 


Square Patients 21 7 17 
Students 22 6 12 


Triangle Patients 17 10 18 
Students 14 14 13 

















Table 2 shows frequencies of geometric figure placement. The most consistent 
trend evident in this table is the sparing use of the lower right quadrant for all 
figures. Chi-square probabilities are: less than .01 for the circle, greater than .95 for 
the square, and greater than .8 for the triangle. 

It should be noted that the data show striking similarities in the two popula- 
tions despite differences in age, educational level, mental status, sex composition, and 
socio-economic status. The difference in circle placement may be attributable to 
sampling error. Even in this instance, the upper left quadrant is most frequently 
used in both groups. 


Discussion 
The results of this experiment contradict the hypothesis that productions of 
geometric figures evoked in relatively unstructured situations have potential for 
differentiating populations varying across several parameters. This does not appear 
due to the simplicity of the task per se in that King, for one, was able to demon- 
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strate significant differences between normals and patients on several simple motor 
tasks such as finger tapping and reaction time. There are implications with respect 
to the hypotheses advanced concerning the meaning of placement in the production 
of various figure drawings“: *», Swensen“? comments on the apparent lack of sup- 
port for the placement hypotheses in his review of this area. 

The most parsimonious explanation for the consistency of the findings seems to 
lie in the tulation of a strong habit preference, most likely ingrained in early 
schooling, aay an upper left approach to the blank page when the subject has a pencil 
in hand. This learned behavior appears to override the possible influence of “‘dy- 
namic” determinants in the situation discussed. 


SUMMARY 
Fifty hospitalized veterans and 47 college students were instructed to draw 
horizontal and vertical lines, a circle, square, and triangle on standard white bond 
paper. With the exception of the circle placement, the groups show marked similar 
preference for utilization of the upper left portion of the page. The importance of 
common learned behavior in limiting the range of ‘““dynamically’’ determined choices 
is discussed. 
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A NOTE ON THE INFLUENCE OF THE SEASON ON TREE DRAWINGS 
ABE J. JUDSON AND BARBARA W. MAC CASLAND 
Utica College of Syracuse University Marcy (N. Y.) State Hospital 


PROBLEM 

It is recognized that irrelevant factors, i.e., factors other than the test materials 
and the personality structure of the subject, can influence projective test results. 
The purpose of this study is to determine the effect of one such factor, namely, the 
season of the year, on the drawing of a tree. Our hypothesis is that bare trees are 
more likely to be drawn in winter than summer. 

Although none of the major investigators of tree drawings emphasize bareness 
as such, the presence or absence of foliage may have significance. Thus, Hammer 
@, p. 9) notes that “In a general way, the overall impression conveyed by the 
branches correlates with the broad personality setting of the subject, whether the 
branch or foliage treatment is composed of lively, animated and soft effects, or 
angular, harsh and stern outlines, or jerky, irritable anxious and insecure treatment 
—the drawing page serves as a canvas upon which the subject sketches his more 
enduring personality mood.” Koch: ». *) writes that ‘even without investigation 
of the details one can receive an impression of harmony, of unrest, of emptiness, of 
baldness or of fullness.’’ 


On the House-Tree-Person test, a bare tree may influence interpretation in- 
directly. In response to the question during the inquiry ‘“What is the weather like 
in this picture?’’, a patient who has drawn a bare tree is more likely to consider the 
weather forbidding and this response could have interpretive significance. Buck 
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recognizes that the weather conditions at the time of testing may influence the 
patient’s response to the inquiry and, by implication, to the test proper. “A sub- 
ject’s description of story weather which duplicates exactly or almost exactly the 
weather actually pertaining outdoors at the time of the interview may be influenced 
solely by that weather”: ». 17, Koch®) who specifically asks his subjects to draw 
a fruit tree, does suggest that the season may influence the result but regards the 
possibility as remote. 

If tree drawings are influenced by seasonal factors, Buck’s view that the tree 
“symbolizes the individual’s feeling (conscious or unconscious) of himself in relation 
to his environment’’“: ». !7) will be slightly weakened. Only slightly weakened be- 
cause a great many aspects of the drawing other than foliage enter into the inter- 
pretation. 


METHOD AND RESULTS 

Twenty H-T-P records for each month were collected from the psychology 
department files, providing a total of 240 cases.' Men and women were equally 
represented and the most recent cases for each month were selected. For scoring 
purposes, only two categories were used, the presence or absence of foliage. Any 
drawing in which an attempt was made to portray foliage, if only by a simple circle 
attached to the trunk, was placed in the foliage group. In a set of 100 drawings, two 
scorers disagreed on the placement of only one drawing. Patients represented a 
variety of psychiatric diagnoses. : 

Taste 1. Tae Rewation or Presence or ABSENCE oF FoLtIaGcE IN 
Tree Drawines To SEASON OF THE YEAR 











Men Women 


Foliage Foliage Mean Foliage 
Season Age Absent Present Absent 


Dec., Jan., Feb. ‘ 14 16 
Mar., April, May ‘ ll 19 
June, July, Aug. 5 8 22 
Sept., Oct., Nov. . 13 17 


Chi Square = 2.96 
p> .05 














Results are presented separately for men and women in Table 1 and are grouped 
into four categories of three months each, corresponding roughly to the four seasons. 
Because age affects results (there is a significant tendency for older women to draw 
bare trees and a similar but not significant difference for men), the mean age of the 
Ss in each season is listed in Table 1. When tested by analysis of variance, these age 
differences were not significant. 

Although there is a trend in the expected direction for men, the chi square of 
2.96 is not significant. If we compare only the frequencies for the winter months of 
December, January and February with those for the summer months of June, July, 
and August, the resulting chi square of 1.79, corrected for continuity, has a p value 
of < .10, using a one-tailed test of significance. Although the difference fails to reach 
an adequate significance level, it does provide some support for the hypothesis. For 
women oe prediction is confirmed. The chi square of 13.82 is significant beyond the 
.01 level. 

If, as these results suggest, the season influences the type of tree drawn, we 
might expect that more evergreens will be drawn during the Christmas season than 
at other times of the year. However, there were only 15 evergreens drawn in the 
entire group and these were randomly distributed throughout the year. The results, 


1We wish to thank Miss Sara Coxe for her help in data collection. 
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particularly for women, suggest that the season does influence the tree drawing and 
this should be taken into account before ascribing any significance to a bare tree. 
Less significance should be attached to the drawing of a bare tree in winter than in 
summer. 


SUMMARY 
It was hypothesized that on tree-drawing tests, as a result of seasonal influence, 
more bare trees would be drawn in winter than summer. Tle results for women con- 
firm this expectation and the results for men are in the expected direction but fall 
short of the 5% level of significance. The influence of the season should be consid- 
ered before attaching denied to the drawing of a bare tree. 
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USE OF THE DESPERT FABLES WITH DISTURBED CHILDREN 
H. E, PEIXOTTO 
The Catholic University of America, Child Center 


INTRODUCTION 

A number of different projective tests are being used to evaluate children by 
both psychiatrists and psychologists in spite of the fact that few if any empirical 
studies are available to guide them in interpreting the results of these tests. The 
present investigation is the third in a series“: * on one such test: The Despert 
Fables. It studies the type of response given to the test by emotionally disturbed 
children who have been accepted by a child guidance out-patient clinic, and it com- 
pares their responses with those of a normal population“). Other empirical evalua- 
tions of this test“: *) have discussed and interpreted the responses of particular types 
of patients, without indicating whether or not their responses differ from those of a 
normal population. 


PROCEDURE 

The Despert Fables are the most frequently used projective content test at the 
Child Center of the Catholic University within the age range of 3 to 10 years. They 
are seldom used with patients whose chronological age is over ten; although there are 
in this study three subjects thirteen years of age or older among the patients, one boy 
and two girls. The subjects in this study were all patients at the Child Center of 
Catholic University during the years 1949 through 1957. The sample is so small that 
subdividing it according to diagnosis as well as age and sex was not feasible. How- 
ever, the personality type most frequently diagnosed in the clinic is passive-aggress- 
ive reaction. Table 1 describes the subjects of this study. 


TasLe 1, DisrrisuTion or Supsects 


CA CA 
6-8 yrs. 9-12 yrs. 13+ Totals 
Girls ll 12 od 25 
Boys 36 21 1° 58 
Totals 47 33 3 83 


*Not included in statistical analysis. 
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RESULTS 

The responses to the Fables' were tabulated, as far as possible, in accordance 
with the categories used in obtaining norms from a standardization population. 
Among the twenty Fables, only four new categories had to be added. These occur for 
Fables II, 1V, XVIII. However, evasive responses were given much more frequently 
in the patient population than in the normal population so that this category had to 
be included for four additional fables so that it now occurs twenty times. Percentage 
frequencies were computed for male and female subjects separately for the two 
younger age groups.” The N (1 boy and 2 girls) in the oldest group is so small that 
percentages would be meaningless. It is impossible to summarize data of this kind 
in any meaningful way. The best which can be done is to point out the most interest- 
ing and /or significant differences between the responses of the experimental (patient) 
group and those of the control (normal) group. To this purpose 115 differences out of 
a total of 430 were chosen by inspection for further consideration. Fifty eight of the 
115 items proved to be significant: 31 at the .01 level or better, 9 at the .02 level and 
18 at the .05 level. 


DIscUssiIoN 

Fable I. Although the responses of the experimental group to this fable are 
often at variance with the expected response from normals, only one such difference is 
significant and this at only the .05 level. A significantly greater number of the patient 
group of younger boys than the corresponding girls would go to both mother and 
father. Apparently the boys have more trouble in expressing a preference between 
the parents than do the girls, suggesting that oedipal feelings have not yet been re- 
solved. The opposite trend in regard to this response for the sexes is shown by the 
normal group. Dynamically, there seems to be little difference between the response 
“To a tree’ and “Away”, both of which would be interpreted as independent re- 
sponses. It appears, then, that an independent response to this fable is the most 


frequent for both groups at each age level and for both sexes. The tendency for the 
older group to give more dependent responses than the younger group occurs in the 
patient as well as in the normal group. It may be concluded, then, that this fable is 
of little value in discriminating between a patient and normal group. This finding 
combined with its low reliability ©) suggests that the fable is of questionable value. 
Evaluation: Poor. 


Fable II. This fable shows some very significant differences between the pa- 
tient and normal group as well as adding a new response other than “‘Evasive”’ for the 
patient group. In both age groups of patients a fairly sizable number of boys express 
openly a desire to ‘‘Get away from the parents’. This suggests more direct rejection 
and hostility than is found in a normal group. Practically none of the patient group 
takes cognizance of the party and although the normal group which does do so re- 
sponds by withdrawal, the significant differences here seem to be important. An 
evasive response, ignoring the party, is the most frequent response for the patient 
group at both age levels, while it is only the most frequent response for the younger 
age group among the normal subjects. A significantly greater number of girls in both 
patient age groups give an evasive response. The only two girls in the age group 
thirteen and older also gave an evasive response. This fable shows some promise in 
discriminating between a patient and a normal group, specially for girls. More in- 
formation is needed, however, to interpret the evasive response in accordance with its 
meaning to the subject. Dynamic: Rejection; Evaluation: good. 


Fable III. In general this fable is one of the most satisfactory in the series. 
Although the majority of the patient group as well as of the normal group give the 
mature response, significantly fewer boys in the younger patient group give this re- 
sponse than in the comparable normal group, while a significantly greater number 


1Fables are given in previous publications“), 
*These results may be obtained in tabular form from the author. 
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give an immature or evasive response. However, the fable appears to have little if 
any discriminative value for girls. The fact that so large a percent of even the patient 
group give the mature response to this fable suggests that increased clinical signifi- 
cance might be attached to the poorer responses. On the other hand in most cases, 
this fable is not sensitive to emotional disturbance. Dynamics: Maturity rejection 
Evaluation: Good. 


Fable IV. This fable is presumed to measure identification of the child with 
other members of the family®). However, it is not a highly reliable fable® and 
hence one can place less confidence in the provocative differences which have been 
found than would otherwise be the case. Girls in the younger patient group do not 
identify with the mother, the difference from the normal group being highly signi- 
ficant. Therefore, it might be surmised that younger girls in the patient group seem 
less mature and slower in resolving oedipal feelings than do those in the normal group. 
They appear to take refuge in evasive responses. Approximately the same (and the 
largest) per cent of patient and normal subjects identify with the father. This fable 
seems to be of minimal discriminatory value for the younger girls and to have no 
such value for boys. Evaluation: Poor. 


Fable V. This is the first of the fables considered by Fine ®? to reflect hostility; 
however, there is some evidence that this hostility may be more definitely described 
as sibling rivalry for some subjects while for others the main dynamic seems to be 
fear of hostility or retribution. While there are no significant differences between the 
patient and the normal group for the more popular responses, there are some signi- 
ficant differences among the less frequent responses. A significantly larger number of 
younger boys in the patient group express open hostility to the father and to the 
baby. The older girls in the patient group concentrate their responses among ‘‘Dog’’, 
“Sibling” and ‘‘Mother’’. The only instance where the lack of frequency is significant 
among the remaining responses is to the response ‘‘Family’”’. Apparently girls con- 


centrate their hostility more and express less hostility to the father. It is possible this 
may explain the significantly fewer responses to “Family” since such a response 
would include “‘Father’’. This fable seems to substantiate the finding that unresolved 
oedipal feelings predominate in the patient group. Dynamics: Hostility retribution. 
Evaluation: Good. 


Fable VI. This fable appears to add little or nothing to the group of fables, 
specially in view of the more significant findings obtained from Fable III. The only 
significant difference between the patient and normal group for this fable is that the 
girls in the older normal group give more emotional responses than do those of the 
patient group. The difference does not have diagnostic value. The obvious inter- 
pretation is that the patient group identifies better with the characters in the stories 
thus projecting their real feelings and needs, while the normal group does not so 
identify. Evaluation: Inadequate. 


Fable VII. This is one of the more discriminating fables, and has very good 
reliability“) for the two age groups represented here. Although among all the sub- 
jects in both groups the most frequent wish is for more material things, a significantly 
larger percentage of girls in the younger patient group respond according to the 
pleasure principle, while conversely, the percentage of this group showing super-ego 
development is significantly smaller. None of this patient group expresses a desire to 
“Go home’, which response, too, is significantly different from that of the com- 
parable normal group. Dynamics: Pleasure principle. Evaluation: Good. 


Fable VIII. This is the second of the three fables which reflects hostility. 
Despert believes it specifically investigates death wishes. There are some ex- 
tremely interesting differences found in the responses to this fable but its diagnostic 
significance is questionable because of the very limited reliability ®. The girls in the 
younger patient group seem completely unable to express this kind of hostile wish 
toward either parent, the differences being significant between boys of the same 
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age group, and girls of the normal group. However, girls in the older patient group 
express such feelings with significantly greater frequency than girls in the comparable 
control group. None of the girls in this older patient group expressed these hostile 
feelings toward brothers, while boys in the patient group, and boys and girls in the 
control group all gave this response. Both girls and boys in the normal group resorted 
to expressing hostility to persons outside the family significantly more frequently 
than did either boys or girls in the comparable patient group. Apparently direction of 
expressed hostility or death wishes is one of the better diagnostic indices. If further 
investigation should indicate this fable has greater stability than heretofore indicated, 
it would be one of the more valuable fables. Dynamics: Hostility. Evaluation: Good. 


Fable IX. The main theme of this fable is fear. It is reliable for younger 
children“) but not for older children. Although fewer patients in both age groups 
expressed fear of the dark, only in one case (when younger boy patients and controls 
were compared) was this difference significant. A significant number of younger male 
patients expressed fear of persons than did the comparable group of normal males. 
However, this difference was significant at only the .05 level of confidence. The 
largest number of significant differences for this fable were in fear of self. Again, 
surprisingly, the larger number of such responses were given by the normal group. 
The differences were significant between younger female, older males and females. 
Fear of solitude was a response given by the patient group but never by the normal 
group. The younger girls gave this response most frequently. Dynamics: Fear. 
Evaluation: Good. 


Fable X. This is the only fable designed to investigate castration fears). 
Fortunately it is a reliable fable. There were several significant differences found 
between the normal and patient groups although these differences may not be in 
expected directions. Significantly (at the .05 level) fewer girls of the patient group 


responded that the elephant became bigger, while significantly (at the .02 level), 
fewer of the boys of the patient group thought the elephant was damaged. More 
girls than boys avoided the issue by giving evasive responses, and the number of girls 
in the patient group giving an evasive response was much greater than girls of the 
normal group. All of these significant differences were for the younger group. There 
were no significant differences found for the older group. These results suggest that 
castration fears arouse more conflicts in girls from 6 to 8 years of age and that they 
tend to repress these fears. Dynamics: Castration fear. Evaluation: Good. 


Fable XI. This fable is the second of four presumed by Fine®? to investigate 
dependency. Like Fable I, also considered to measure dependency, this fable has 
doubtful reliability. The only group for which it is reliable, the younger group, shows 
no differences between the patient and normal groups for either sex. ‘The only signi- 
ficant difference which is shown is that older boys in the patient group give signifi- 
cantly fewer emotional responses than do boys in the comparable normal group. This 
finding again suggests that the patient population is identifying with the child in the 
story more adequately than does the normal population: the latter projects imagined 
feelings of the character rather than their own feelings to the character in the story. 
Evaluation: Poor. 


Fable XII. This is the last of the trio of fables which measures hostility. Like 
the other two fables measuring hostility, there are several significant differences be- 
tween the patient and control groups. Significantly more boys than girls in both the 
patient groups express hostility to the father. However, compared with the normal 
group, significantly fewer girls in the younger patient group are able to express hostil- 
ity to the father. This may be interpreted as prolonging the oedipal phase in the girls 
with emotional disturbances. Among the older subjects the boys and girls of the 
normal group both express significantly more hostility to siblings than do patients. 
This finding confirms in part those on Fable VIII. Older girls among the patients 
express hostility to the “Baby” with significantly greater frequency than do the 
normals. The same trend is shown for the boys but the difference is not significant. 
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The actual percentages are about the same for the patient groups at both ages but 
with the normal subjects this response drops out in the older group. This finding 
suggests difficulty in growing up, in giving up the baby role among the patients. The 
dynamics expressed here seem to be clinically different from the sibling rivalry dis- 
cussed above. Dynamics: Oedipal feelings /maturity. Evaluation: Good. 


Fable XIII. Like Fable VII this story is supposed to elicit wishes. It is a 
highly reliable fable®) but quite unsatisfactory for discriminating the patient from 
the normals. When the normal group alone is considered this fable seems much more 
satisfactory than does Fable Vil“). The present results cast doubt on the former 
conclusions. In actual practice the most satisfactory responses are to the direct 
question: ‘‘Name three wishes’’, rather than to give the wishfulfillment phantasies 
evoked by the fable. Evaluation: Inadequate. 


Fable XIV. This fable shows no significant difference between the patient and 
normal group to any response. In previous studies ®: *) this fable was found to be of 
minimal value and there is nothing from the present results to make it more valuable. 
Evaluation: Inadequate. 


Fable XV. Both Despert“? and Fine®? conceive of this fable as eliciting in- 
formation concerning the oedipal complex, although other fables also seem to elicit 
similar information“, The only significant difference found between the two groups 
for this fable is that more of the younger normal girls blame father’s anger on mother 
than do the comparable group of patients. More of the girls in this patient group con- 
sider father to be angry because he was omitted. This response seems to have the 
greatest frequency resulting in the significant difference described above. This type 
of response may reflect oedipal feelings expressed indirectly by girls of this age group. 
However, interpretation of this fable seems to be so ambiguous that its value appears 
quite limited. Evaluation: Poor. 


Fable XVI. Although this fable is considered by Fine®? to give information 
concerning fears and /or wishes, it has been considered by others“? a better measure 
of hostility. However, since it is not very reliable and differentiates between the 
patient and normal groups only minimally it would appear to be expendable. A sig- 
nificantly greater number of boys in the younger patient group respond with “‘Fath- 
er” or “Family” than those in the normal group. However, more boys in the older 
normal group give this response than in the younger normal group so that judging 
merely by frequency of response rather than by its dynamic connotations, one might 
interpret this datum as greater maturity on the part of the disturbed boys. Such an 
interpretation seems absurd; on the other hand, such ambiguous results make the 
fable of doubtful value. Evaluation: Inadequate. 


Fable XVII. This is the last and most reliable® of the fables considered to 
measure dependency. However, the results obtained here do not confirm the value of 
this fable. None of the differences measured between the patient and control group 
appeared to be significant at even the .05 level of confidence. These results are con- 
sistent with those found for Fable XIV. It seems possible from the responses that 
instead of dependency, these two fables reflect conflict around the anal period of 
psycho-sexual development. If this is so, it appears that such conflict is not signifi- 
cantly different in these two groups of subjects. In summary, the “dependency test”’ 
of Fine“, consisting of fables I, XI, XIV and XVII, is not very satisfactory. Eval- 
uation: Poor. 


Fable XVIII. This fable, like Fable VII investigates the fears of the child. As 
in Fable VII, there are several interesting and significant differences between the 
groups. Fear of oneself is expressed with greater frequency by the normal than by 
the patient groups in every instance and is significant in three out of the four possible 
comparisons. Older girls in the patient group do not express fear of animals as fre- 
quently as do the girls in the comparable normal group. The boys in this age group 
express fear of self aggression with significantly greater frequency than do those in 
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the comparable normal group. This finding regarding the boys in the patient group 
can be interpreted as indicating lack of controls from within. It is more difficult to 
interpret this result concerning the girls. It appears to bespeak repression or pro- 
jection to the supernatural, neither of which definitely depicts dynamics. Dynamics: 
Controls. Evaluation: Good. 


Fable XIX. Like Fables VII, 1X, XIII, XVI and XVIII, this fable is supposed 
to get at the phantasies of the child. However, unlike the preceding four fables in 
this area, this one does not set the stage for the hedonistic tone of the phantasy, yet 
it is one of the most reliable fables“. However, like the other fables investigating 
phantasy, it does not differentiate between the patient and the normal groups as well 
as some of the other fables. The younger group of male patients phantasy self pain 
significantly more often than do those of the normal group. A similar reaction is 
present for the younger girls but the difference between the frequency of their re- 
sponses and those of the control or normal group is not significant. On the other hand, 
by far the largest percentage of both patient and normal subjects phantasy self- 
pleasure. The older boys respond with the answer ‘‘New Baby” significantly more 
often than does the comparable normal group. This could mean that the advent of a 
baby in the family for emotionally upset youngsters is cause for conflict (see Fables 
V and X11) or that a proximate cause for disturbance in children seeking out-patient 
help is the advent of a new baby. Dynamics: Pleasure principle. Evaluation: Good. 


Fable XX. According to Fine this fable was added to investigate sibling rivalry 
in conjunction with Fable VI. Depending on the response, however, it frequently 
gets at parental rejection rather than sibling rivalry thus making it an adjunct to 
Fables II and III. According to these results, a significantly larger number of boys 
in the older patient group would send away the older child, while among the younger 
male patient group a significantly larger number than in the normal group would 
send away the younger child. These responses suggest feelings of rejection more 
strongly then feelings of sibling rivalry. A significantly larger number of normal boys 
in both the younger and older groups respond with ‘Neither’ suggesting that sibling 
rivalry and /or rejection are repressed by these subjects. The younger patient group 
gives more ‘Evasive’ responses than does the control group. How this response 
differs dynamically from the response “Neither” is not clear. Dynamics: Rejection. 
Evaluation: Good. 


CoNCLUSIONS 

Based on a comparison of the responses of 83 child clinic patients with normal 
groups, it seems that the Despert Fables are unsatisfactory as a means of differential 
diagnosis. In most cases both the normal and patient groups have the same “‘pop- 
ular’ responses. In some instances there are significant differences between the two 
groups on these popular responses but none has sufficiently high reliability for a 
particular diagnosis. Differences among the ‘“‘popular”’ responses occur for seven 
Fables: II, I11, VII, VIII, XII, XVIII and XX. In six of these Fables: II, VII, 
VIII, XII, XVIII and XX, significant differences were found between both the pop- 
ular and unusual responses. 

Significant differences occur for the unusual responses alone on eleven of the 
twenty Fables. These are Fables: 1, IV, V, VI, IX, X, XI, XIII, XV, XVI and XIX. 
When the six fables mentioned above having differences for both the popular and 
unusual responses are included here it is evident that by far the majority of signi- 
ficant differences occur among the unusual responses. Two fables, XIV and XVII 
do not discriminate between the patient and normal groups at all. 

In view of these results it appears that the Despert Fables may reflect certain 
psychodynamic content of a child’s conscious or unconscious, but that they are of 
little value for differential diagnosis. It may also be concluded that the psycho- 
dynamic content of patients and non-patients varies very little and that there is no 
simple etiology for emotional upsetness in children. This conclusion is in disagree- 
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ment with that of Despert) and Fine®); this may be because of the limited com- 
parisons made in their studies. 
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THE INTELLECTUAL FUNCTIONING OF POSTPOLIOMYELITIC 
PATIENTS 


LEONARD V. WENDLAND, ALBERT H. URMER AND H. WILLIAM SAFFORD 


iratory and Rehabilitation Center for Poliomyelitis 
Los Amigos Hospital, Downey, California 


INTRODUCTION 

A traumatic incident such as contracting poliomyelitis has been shown to have 
sufficient effect upon the personality as to throw some doubt on the interpretation of 
the results in commonly utilized tests. In conjunction with this finding, interest was 
manifested in the intellectual function of the postpoliomyelitic patient. If the post- 
poliomyelitic patient functions below his preonset level on a particular instrument, 
then care must be taken in utilizing this instrument for prediction, since the pre- 
dictive accuracy of the instrument may be greatly reduced. This investigation had a 
dual purpose, (a) to evaluate the level of intelligence of a sample of the hospital 
respiratory-poliomyelitic population on the basis of their intellectual function, and 
(b) to determine if their function has changed from that of prepoliomyelitis onset. 


PROCEDURE 


The verbal portion of the Wechsler Adult Intelligence Scale (WAIS) was ad- 
ministered to 93 patients of whom 77 were inpatients at the time of the testing. The 
initial purpose of the research was to study the relation between intelligence test 
scores and the following variables: (a) sex, (b) age at onset, and (c) length of time 
between onset and testing. In conjunction with the above, it was necessary to trace 
back into the patient history to get test results which might give us indices of pre- 
onset intellectual function. Since patients varied in age from below 20 years to ap- 
proximately 50 years, this tracing process proved rather difficult. A total of 20 
patients were found on whom preonset intelligence testing was available, including 
12 patients on whom Otis scores (various forms) were obtained and 8 patients on 
whom California Test of Mental Maturity scores were available. Both tests have 
been adequately validated which made their use feasible. The only caution necessary 
is the comparison of these group tests to an individual test (W AIS). 

To analyze the intellectual level of our sample with the three variables men- 
tioned above, the analysis of variance technique was utilized. To investigate pre- 
and postonset intellectual function, it was necessary to convert the actual test 
scores on the various measures into a common ordinal scale in order to utilize some 
statistical method of comparison. The method utilized was to consider the mean 
score as given by the test as equivalent to a score of 50 and essentially convert all 
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our scores to 7 scores. Therefore, a standard deviation of one on any of the measures 
would be equivalent to 10 score points in our new scale. This method was considered 
valid as no assumptions were made about the distribution of the test, rather the 
assumption was about the population. Presumably a person being tested on a series 
of tests, yet still remaining in the same population will fall within the same relative 
area in the normal distribution curve. Therefore, the type of test has no bearing on 
our new scale score as the person will still have the same relative ranking to the 
population as a whole. By converting to T scores these data were normalized, making 
the assumptions more valid. 


RESULTS 


Table 1 indicates the IQ means on the WAIS for the population. The analysis 
of variance indicated that there was a significant difference between age groups as 
shown by an F of 4.45 which exceeded the .01 level of confidence. There was no 


Taste 1. WAIS Scorzs 








Mean IQ N 
Patients 109 93 
Male 113 46 
Female 106 47 
Length of time, onset to testing 
0- 6 months 104 22 
7 - 12 months 107 20 
13 - 24 months 111 19 
2- 5 years 113 18 
6 years and over 112 14 
Age at onset 
20 years and under 103 19 
21 - 25 104 22 
26 - 30 110 20 
31 - 35 115 23 
36 years and over 118 9 





significant difference between the time interval of onset of poliomyelitis and the ad- 
ministration of the test. There was a significant difference in the test scores between 
males and females as indicated by an F of 10.15, again exceeding the .01 level of 
confidence. To investigate the difference between the means where the over-all age 
differences were significant, Sheffe’s method for getting the standard error of the 
difference of the means was used. The result was the 20 year and younger age groups 
and the 21-25 year age group differed significantly from the 31 and older age groups. 

It is noteworthy that the mean IQ on this test tends to increase with increased 
age groups (Table 1). The questions to be answered are (a) is this strictly a function 
of the test? (which we seriously doubt because the WAIS does compensate for in- 
creases in ages), (b) is this a selective factor of our poliomyelitic population?, and 
(c) is this a true difference which would tend to indicate that the older age groups 
tend to function more efficiently, intellectually? Our results are inconclusive as to 
these factors and further investigations need to be undertaken to answer these 
questions. 

The second part of this investigation dealt with a comparison of the preonset to 
postonset intellectual functions of the patients on whom preonset intelligence test 
data were available. Wilcoxon’s Matched Paired Sign Ranks Test was used for three 
reasons: (1) The group distributions were not normal. (2) The data were derived from 
matched samples. (3) A non-parametric technique was desired which does evaluate 
and weigh the actual numerical difference between two scores rather than just 
ranking them as other nonparametric techniques. 
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The results indicated that there was a significant decrease in intellectual func- 
tion, postonset (Table.2), as indicated by a T (Wilcoxon) of 48 which with an N of 


TaBLp 2. CoMPARISON OF PRE- AND Postonset Tust Scorgs (7' Scorzs) 











Patient Preonset Postonset 
Score (WAIS) 


46.1 
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20 is significant beyond .05 level of confidence. It is also noteworthy that out of the 
20 patients, seven increased scores from pre- to postonset while 13 decreased their 
scores. However, the mean increase was only a total of 3.7 scale points while the 
mean decrease was 9.7 scale points. While the increase was roughly the same, which 


is mostly a chance factor, the decrease was highly significant being almost one 
standard deviation. 


SUMMARY AND CONCLUSIONS 


The level of intellectual function of 93 post bulbospinal poliomyelitic patients 
was investigated to determine if any systematic changes had occurred. The results 
indicate that (a) intellectual scores increase monotonically with increasing age, (b) 
male scores were significantly higher than female scores, and (c) a significant re- 
duction in intellectual function from preillness function is found. 

While this investigation utilized only postpoliomyelitic patients, it is probable 
that similar results would be ubtained using other disease groups requiring long-term 
hospitalization. It is, therefore, necessary to emphasize that the above reduction in 
test scores is not necessarily due to polio, but maybe a depression effect due to 
hospitalization per se. In evaluating the test results of postpoliomyelitic patients it is 
therefore important that these results be considered since a negative bias may result 
in predicting a patient’s future potential. 











THE EFFECT OF BRAIN DAMAGE ON RAVEN’S 
PROGRESSIVE MATRICES 
ALBERT H. URMER, ANN B. MORRIS' AND LEONARD V. WENDLAND 


Pagers and Rehabilitation Center for Poliomyelitis 
Ra Los Amigos Hospital, Downey, Califorma 


PROBLEM 


In a recent review on Raven’s Progressive Matrices (PM), Burke® lists a 
number of investigations using the PM with atypical groups but none dealing with 
the effect of brain damage on PM performance. The performance on the PM ob- 
viously is determined partially by visual organization abilities in conjunction with 
any intellectual factors. Concurrently the literature reports the sensitivity of visual 
organization tasks to brain damage; therefore, one would expect PM performance 
changes beyond those due to the intellectual function depression concomitant with 
brain damage. Two hypotheses were made: (a) brain damaged subjects will perform 
quantitatively poorer than non-brain damaged subjects of similar intellectual func- 
tion and (b) they will make different response patterns which can be considered as 
qualitatively different. 


PROCEDURE 


Two groups of 20 subjects each were group matched on the basis of age and sex, 
with one group having a medical diagnosis of a non-neurological nature (control 
group), the other having a diagnosis of encephalomalacia due to cerebral vascular 
accident (CVA group). All CVA subjects were right-handed prior to illness and 
postillness 18 were left hemiplegics and 2 were right hemiplegics. Eisenson’s“? 
aphasia test was administered prior to testing to eliminate aphasics. The age range 
for the two groups was from 30-60 with the control group mean age = 44.2 and the 
CVA group mean age = 44.9. 

The Wechsler Adult Intelligence Scale (WAIS) and the Progressive Matrices 
1938 form (PM: revised order 1956) © were administered to all Ss with the order of 
presentation randomly determined. The Digit Symbol subtest of the WAIS was 
omitted and the four remaining subtests prorated to produce the Performance Score. 
The booklet form of the PM was administered individually with E recording the 
answers. 


RESULTS 


A comparison of the two groups on the total WAIS score using the Mann- 
Whitney U test yielded a U = 108 (p = .02) which indicated significant intellectual 
performance differences between the two groups. The total WAIS score being sensi- 
tive to organic involvement, the three scales least sensitive to brain damage (Inform- 
ation, Comprehension and Vocabulary)“: ?) were utilized and the two groups com- 
pared using the sum of the three scale scores. The results yielded U = 197 (p = .47) 
which indicated the groups are similar as to their intellectual function when the 
effect of organic involvement is minimized. 

In evaluating the quantitative performance Jifferences between the two groups on 
the PM the analysis of covariance technique was used to control for any intellectual 
differences between the two groups. The total PM scores were regressed against the 
total WAIS scores to determine if the depression on PM performance for the CVA 
group is greater than any WAIS changes. The result yielded F = 10.54 (p = .05) 
which indicated the two groups differed statistically significantly on the PM beyond 
any differences on the WAIS. The performance of the two groups on the PM was 
then evaluated using analysis of variance. 





1Submitted in partial fulfillment toward the requirements for the Master of Arts degree to the 
faculty of Occidental College 7 B. Morris. 

e Respiratory and Rehabilitation Center for Poliomyelitis is aided by an annual grant from 

The National Foundation. 
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The results indicated significant differences between the two groups (F = 30.8, 
p < .001), between the five PM sets (F = 24.9, p < .001) as well as significant inter- 
actions (F = 29.3, p = .01). The group means were compared using Scheffe’s “ 
method but no consistent interaction trends were found (Table 1). 


TaBLe 1. ComMPARISON OF DiFFERENCES BETWEEN Group MEANS oN PM 








Experimental Group Control Group 
B Cc D B Cc D E 





2.92 
3.75* 0.83 
6.83** 3.91* 3.08 


8.08** 11.00** 11.83** 14.91** 
5.25** 8.17** 9.00** 12.08** 2.83 

1.50 4.42* 5.25** 8.33** 6.58** 3.75* 

1.17 4.09*  4.92** 8.00** 6.91** 4.08* 0.33 

3.67* 0.75 0.08 3.16 11.75** 8.92** 5.17** 4.84* 











* 
p = 05 
**p = 01 


The qualitative performance differences between the two groups were evaluated 
next. A consistency score was developed based on the assumption that the items 
within any set were ordered as a function of their difficulty; therefore, fewer errors 
are expected on the earlier items than the later items in any set. As a function of 
item difficulty the expectation is that once an error is made few later items will be 
answered correctly. Thus, answering items 1-10 correctly would yield a consistency 
score of 100, while answering items 2-11 correctly would yield a consistency score of 
0. The most striking result was the difference in response consistency between the 
two groups. The consistency score was developed by determining the percentage of 
correct responses on any set S made prior to making an error. 

Table 2 indicates the mean consistency for each set, as well as the significant 
differences between the two groups on sets A (U = 91), B (U = 75), C (U = 59), 


TaBLe 2. Man Consistency Per Cent 





Set A B Cc D 


Control Group 85 71 73 
42 


CVA Group 78 54 31 
U 91 75 59 95 


p 01 001 .001 05 NS 








D (U = 95). The differences on set E were not significant. Another qualitative com- 
parison indicates that the CVA group deviated from the control group as a function 
of their deviation score based on Raven’s®) normal score composition. The CVA 
group yields significantly more minus deviations on set A (2x = 12.25 p = .01) and 
significantly more plus deviations on sets D (x? = 13.87 p = .01) and E (x? = 19.61 
p = Ol). 

The final analysis de~lt with the type of errors made. The only significant result 
indicated that whenever possible the CVA group made an error by figure-ground 
reversal, 
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Table 3 shows the Pearson correlation coefficients comparing WAIS and PM 
performance for the two groups. It is noteworthy that the correlations for the control 


Tas_e 3. Pearson Corretation WAIS — PM 





Groups Full Scale Score Verbal 





Control 47 .43 
CVA .40 .03 





D. Pict. B. 
Inf. Sim. Span. Voc. Comp. Des. 


Control .33 .34 31 .10 .24 .52 .45 


42 ‘ 
CVA .06 01 -26 25 07 —.21 75 -65 -50 





group are similar for the Verbal (.45), Performance (.43), and total (.47) scores 
(with PM), while for the CVA group the Verbal-PM correlation drops to .03 while 
the Performance-PM correlation rises to .73. On the subtests the differences between 
the two groups are most noteworthy on Information, Comprehension and Vocabu- 
lary. 


Discussion 


The results indicate that PM performance by brain damaged Ss is depressed 
much greater than any depression found on WAIS performance, which confirms the 
initial hypothesis that the PM is sensitive to brain damage. Another aspect con- 
firming this hypothesis is the low CVA group correlations between total PM scores 
and the WAIS verbal, Information, Comprehension scores and the minus correlation 
with Vocabulary. These low correlations are due to the ormance on these 
WAIS subtests not being depressed by brain damage while the PM performance is 
depressed. Concurrently the WAIS performance correlation with the PM for the 
CVA group increased due to the performance part being more sensitive to brain 
damage than the verbal part. 

The differences between the two groups on the consistency scores reflect response 
pattern differences with brain damaged individuals above the intellectual score 
depression. The lack of difference on set E is due to both groups performing poorly. 
The implication of this factor is that tests constructed on the basis of item difficulty 
change in their relative item difficulty for brain damaged individuals. This may re- 
sult in an extreme score depression if testing is discontinued after a specified number 
of item failures. Another implication of the consistency score differences between 
the two groups is that some items may change qualitatively as well as quantita- 
tively for CVA subjects. In conjunction with this result it is noteworthy that the 
CVA group performed relatively poorest on the simplest set (Set A) as indicated 
by their minus deviations from Raven’s normal score composition. 


SuMMARY AND CONCLUSIONS 


Two groups, one control and one with a diagnosis of cerebro-vascular accident 
were compared on the WAIS and Raven’s Progressive Matrices. The results in- 
dicated that brain damaged Ss perform qualitatively as well as quantitatively poorer 
on the PM than control Ss. The total PM performance for brain damaged is de- 
pressed beyond any intellectual depression reflected in the WAIS. The qualitative 
changes are reflected by the CVA group being less consistent in responding to the 
item difficulty as well as making figure-ground reversal errors. It can be generally 
concluded that the use of instruments which cease testing after a specified number 
of failures would penalize brain damaged individuals excessively. 
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TEST-RETEST PERFORMANCE OF SCHIZOPHRENICS ON TWO 
FORMS OF THE PORTEUS MAZES 


RICHARD F. DOCTER! 
University of California, Los Angeles* 


PROBLEM 

Porteus has developed an Extension series of pencil maze tests intended to 
give scores “‘. . . as nearly equivalent as possible to . . . the original series . . .’”’  »- 24) 
and “‘... standardized against the original series in such a way as to be, on the average, 
practice-free’”’*- ». 5), The present study examines these claims by comparing the 
successive performance of a group of schizophrenics given the Vineland* and Ex- 
tension series of the maze tests. In addition to providing normative data, these 
findings may be of value to investigators desiring to use the two forms in pretest- 
posttest designs. Porteus’ norms showing comparative performance on the two 
series of mazes are of limited value for adult clinical populations because his subjects 
were 300 normal adolescents age 14 to 17, and further, he administered the test 
forms in immediate succession. 


PROCEDURE 


The Vineland series was administered three days prior to the Extension series 
to patients undergoing psychological evaluation in an Army neuropsychiatric center. 
In most cases a different examiner gave the second test, and the instructions were 
repeated prior to the retest examination. The subjects were 60 Army enlisted men, 
diagn in one of the subtypes of schizophrenia. Median age was 24, with a range 
from 18 to 44. None had received any form of somatic treatment prior to the time of 
testing. Median for highest school year completed was 11, with a range from three 
years through college graduation. Wechsler-Bellevue Form I Full Scale IQ scores 
ranged from 62 to 127 with a mean of 100, sigma 15. Most of these patients were 
hospitalized in closed wards at the time of testing, although about 25% were des- 
cribed by their psychiatrist as ‘‘in partial remission’’. 


RESULTS 


In Table 1 are summarized the Vineland and Extension series Test Age (TA) 
and Qualitative scores. None of the differences between means is significant. The 
correlation* between the two distributions of TA scores is .50, exactly the same as 
reported by Porteus®. For the Q scores the correlation is .51; Porteus does not 
report similar data. 


1John McCraken and David Goldstein assisted in the administration of tests. 

*Data were collected while the author was on active duty in the United States Army. 
*The Vineland Revision is sometimes identified as the Original series or the 1933 series. 
‘All correlations are Pearson product moment. 
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. Test-Rerest Data FoR VINELAND AND EXTENSION Maze SERIES 








| Present Study (N = 60) Porteus Norms (N = 300) 


| Vineland Extension Vineland Extension 
| TA Q TA Q TA Q TA Q 


Mean 15.0 32.8 15.7 23.7 15.4 23.9 15.4 23.9 
o 2.0 16.7 2.7 15.6 1.5 15.3 1.6 15.4 











Examination of individual test-retest scores revealed that subjects who were 
initially above the median for TA scores tended to have slightly lower retest scores; 
those below the median typically earned higher scores on retesting. This shift in 
scores, Which makes for equivalence of group means, appears to have been achieved 
by making the upper end of Extension series substantially more difficult, thereby 
reducing the retest scores of those who do best on the mazes; this added difficulty 
does not penalize those with lower scores as they typically did not receive credit for 
the most difficult mazes on initial testing. 

The present data correspond closely with previous findings on the relationships 
between TA score and various tests of intelligence. Using the Wechsler-Bellevue 
Form I Full Scale IQ score, the correlation with the Vineland series was .40; the Ex- 
tension series correlated .50. The higher correlation for the Extension series might 
be attributed to its greater range of difficulty which serves to increase the range of 
TA scores. 

The Qualitative score (Q score) is a weighted summation of several different 
penalty scores assessed for violations of test instructions (cut corners, cross lines, lift 
pencil, etc). On retesting, none of the Vineland Q scores increased as much as one 
standard deviation, while 38% decreased at least this amount and 23% decreased 
at least two standard deviations. 

Various investigators have split Q score distributions into high and low groups 
using 29 as a cut-off score. The Vineland low Q score group was 45% but this in- 
creased to 73% with retesting on the Extension series. Almost all of the large de- 
creases in Q score were among cases originally above 29. 

Vineland Q and Vineland TA scores correlate —.24; Vineland Q score and Wech- 
sler-Bellevue Full Scale IQ score correlate —.18. These correlations are quite similar 
to those reported elsewhere®: *). The reliability of scoring the Q score was checked 
by having three trained scorers evaluate each of the Vineland tests; the resulting 
intercorrelations were .95, .96, and .96. 


DiscussION 

One of the requirements for the Extension series was that practice effects from 
the Vineland be ‘‘eliminated or controlled” ®. The purpose of controlling practice 
effects is to make possible comparable test-retest scores, and Porteus has accomp- 
lished this, at least with reference to group means. But it is clear that the method 
for achieving this was not through “‘control’’, for Porteus has increased the difficulty 
of the Extension mazes so as to produce a substantial number of small decrements 
among subjects who scored high on the Vineland series. These changes have the 
effect of balancing the improvements made by about 75% of the subjects whose 
Vineland scores were low; the result is comparable group means for the two test 
forms. Whatever importance this may have, it is clear that the result has not been 
achieved by elimination or control of practice effects. Judging from the number of 
subjects who improve their scores, the maze tests appear to be highly susceptible to 
practice effects. 

The question of reliability of Maze test performance has often been raised, but 
little attention has been given to this matter. For example, in his review Louttit 
wrote: “Based on clinical use alone, the instrument has apparently been found 
acceptable by clinical psychologists. However, there are still questions unanswered 
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as to the meaning, reliability, and validity of the Maze Test performance’ ™ P. %6), 
Advocates of the Maze test have argued that to develop a reliable test of this kind 
would be at the expense of measuring individual differences in adaptation, planning, 
and problem solving which are unique to this task. This may be true, but one must 
then be aware that he is not using the procedure as a test which can be expected to 
yield scores suitable for interpretation. Porteus recognized that the introduction of 
the Extension series brought into focus the problem of reliability. But instead of 
using some conventional method for studying the reliability of performance on the 
forms separately, or the reliability of differences between the two forms, he chose to 
use similarity of group means as an index of reliability. 

It is not satisfactory to argue that group means, however similar, demonstrate 
reliability. The relationship between the Vineland and Extension TA scores has been 
shown to be .50 in the present study and also by Porteus. It is proposed that this be 
interpreted as indicative of low test-retest reliability (coefficient of stability) of the 
Vineland and Extension series. There are no data reported in this study or elsewhere 
pertaining to the reliability of differences between scores on the two forms. 


SUMMARY 

Sixty males diagnosed in one of the categories of schizophrenia were given the 
Porteus Maze Test, Vineland Revision, followed three days later by the Extension 
series of mazes. Group means for Test Age and Qualitative score do not differ sig- 
nificantly for the two forms, but the test-retest correlation for each of these scores is 
considered to be quite low. It is recommended that caution be used in applying these 
tests in pretest-posttest designs. Data are described pertaining to direction of retest 
changes for Test Age and Qualitative score; subjects who have performed well on 
initial testing show less change than those who have first earned lower scores. This 
seems attributable to practice effects. 
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LEARNING IN THE POST-ELECTROSHOCK PERIOD 
SPIRO B. MITSOS! 
Evansville (Indiana) State Hospital 


PROBLEM 

Differences in professional opinion exist regarding the efficacy of re-educative 
psychotherapy in immediate conjunction with shock therapies“: ?» *} , Basic to 
arguments on either side is the degree of impairment of learning processes immedi- 
ately following shock and the rate of recovery of abilities. In a recent study, Zirkle“ 
demonstrated profound interference with learning ability during the immediate post- 
electroshock period, with a gradual return to normal levels within a few hours after 
shock was administered. These results were interpreted as contraindicating re- 
educative therapy in immediate conjunction with shock therapies. 

Zirkle’s interpretation, though appropriate to the data did not consider the 
possible savings in learning trials through practice in the immediate post-shock 


‘ on author wishes to express his appreciation to Mr. Jan Pickett who assisted in the collection 
of data. 
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period. It was therefore decided to apply the savings method to test the hypothesis 
of learning interference. 
PROCEDURE 

Subjects included 24 hospitalized psychiatric patients for whom electroshock 
therapy had been prescribed. On three successive days prior to the initiation of 
therapy, each subject was tested with two trials on ten-pair verbal associate learning 
tasks. 

Alternate subjects were then placed in the Practice Group and Non-Practice 
Group. Subjects in the Practice Group received two trials on the learning tasks as 
soon as they regained consciousness after their first three shock treatments. They 
were again tested with the same tasks approximately six hours later. Subjects in the 
Non-Practice Group were tested only in the afternoon six hours after each of their 
first three treatments. Thus each subject was tested on each of six sets of ten-pair 
series. The lists of pairs had been pre-tested for equal difficulty and the order of pre- 
sentation to each subject was systematically varied. 


RESULTS 


Mean scores for each group are presented in Table 1. Certain of these means 
were compared with ¢ tests with the following results: Total group performance 
prior to shock was significantly higher than that during the pt post-shock 
period (¢ = 5.04, p < .01). The Practice Group performance during the delayed per- 
iod was not significantly different from the Non-Practice Group (¢ = .128, p > .90). 


TaBLe 1. MEAN PerForMANCE Scores DuRING PRE-SHOCK AND Post-SHOCK 
PERIODS 





Immediate Delayed 
Group Pre-shock Post-shock Post-shock 





Total Group 12.89 9.53 
Practice amy J 12.83 1.89 9.44 
Non-Practice Group 12.94 9.61 





CoNCLUSIONS 


These results clearly support the previous findings of profound interference with 
learning following electroshock. Furthermore, practice in the immediate post- 
electroshock period does not seem to have significant effect on performance during a 
more remote period. 

The significant difference between total group performance prior to shock and 
during the delayed post-shock period is not consistent with previous results. For 
some reason the rate of recovery is not as rapid as that described by Zirkle“). This 
discrepancy, however, would not seem to alter the major finding regarding the lack 
of effect of practice trials. In brief, the results of this experiment provide further sup- 
port for the contention that re-educative therapy is not feasible in immediate con- 
junction with electroshock therapy. 
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THE PERFORMANCE OF AGED FEMALES ON FIVE NON-LANGUAGE 
TESTS OF INTELLECTUAL FUNCTIONS! 


ARMAND W. LORANGER? AND HENRYK MISIAK 
Fordham University 


Normative data on the performance of subjects over 70 years of age on psycho- 
logical tests are only beginning to appear. This study presents normative data ob- 
tained during a neuropsychological study“ of 50 females 74-80 years of age, on 5 
tests of intellectual functions: the Digit Symbol subtest of the Wechsler Adult 
Intelligence Scale (WAIS), the Porteus Maze, the Primary Mental Abilities (PMA) 
Reasoning test, the Raven Progressive Matrices, and the Wisconsin Card Sorting 
Test (WCST). Practical and theoretical considerations in the employment of these 
tests with the aged are also discussed. 


SUBJECTS 


The 50 subjects were selected from a total of 140 female residents (74-80 years 
of age) of 8 different homes for the aged. Selection was based mainly on the ab- 
sence of neurological, ophthalmological, and psychiatric pathology, or serious illness 
of any nature. The residents were from varied socio-economic backgrounds, and 
included retired domestics, clerical workers, school teachers, attorneys, and other 
professional people. Many of the homes are the most modern and progressive in the 
world, with extensive activities, programs, and freedom to visit away from the homes 
whenever the residents so desire. In the “progressive’’ homes the atmosphere of 
institutionalization or isolation from the community was virtually non-existent. The 
typical subject was in the home because she lacked living relatives and /or was im- 
pressed by the excellent living conditions there. The median educational level was 9 
years of schooling, and ranged from 0 to 19 years. The average length of residence in 
the homes was two and one-half years. The cooperation and interest of the subjects 
in taking the tests was excellent. In fact many asked for additional tests, or copies 
of the tests to take with them to work on in their spare time. Some even offered to 
pay for the “examination’’. 


PROCEDURE 


The order of administration of the tests was as follows: Porteus Maze, WCST, 
Raven Matrices, Digit Symbol, and PMA Reasoning. No more than one test was 
given on any single day, with the exception that the Digit Symbol and PMA Reason- 
ing were both given at the same testing session. The Porteus Maze and WCST were 
individually administered, while the other tests were either given individually or in 
small groups of two or three subjects. 


Porteus Maze. In administering the Maze Test there was one slight departure from the 
directions given by Porteus“). In the printed mazes the exit is not indicated above the seventh 
ear. In the present study the examiner pointed to the exit of each maze and clearly designated 
it as such with an arrow. Unless the exit was shown to them in this way some of the subjects ex- 
pressed confusion about what they were supposed to do. The average testing time was 30 minutes. 


Wisconsin Card Sorting Test. The procedure and instructions in giving the WCST were the 
same as those of Fey“). The test was terminated when the subject successfully completed 6 
categories, or sorted 64 cards in succession without obtaining 10 successive correct responses. 
The average testing time was 25 minutes. 


Raven Progressive Matrices. The Progressive Matrices was administered according to the 
directions in the test manual, The only exception was that the subjects were instructed to 


1This investigation was appuue by a PHS research grant (M-1283) from the National Institute 


of Mental Health, Public Health Service. Appreciation is expressed to the superiors and sisters of the 

homes of the Carmelite sisters and Little Sisters of the Poor in New York City for providing the sub- 

—_ This paper was presented at the American Psychological Association Convention, Cincinnati, 
pt., 1959. 


*Now at New York Hospital, Westchester Division, White Plains, N. Y. 
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write their answers in the test booklet next to the problem, rather than on a separate answer sheet. 
This was done to eliminate errors in transcribing the answers, The average testing time was 55 
minutes. 

Digit Symbol. The Digit Symbol test was given according to the standard procedure“) with 
a time limit of 90 seconds. 


PMA Reasoning. Preliminary work with this test revealed that it was the most difficult to 
administer to the aged. The regular testing booklet and separate answer sheet were replaced by 
mimeographed sheets. The latter duplicated the original in every way except that capital letters 
and larger print were used, because on the standard answer sheet some subjects confused the 
letter ‘’’ with the letter ‘‘j’’. The examiner usually spent up to 20 minutes assisting the subject 
with the 10 practice exercises. During the actual test when the standard six minutes had elapsed, 
the subject was given a red pencil and allowed to complete the remaining items. In this way 
both timed and untimed scores were obtained. The average testing time, exclusive of the practice 
period, was 25 minutes. 


RESULTS 
TaBLE 1. Meprans, Means, STANDARD DEVIATIONS, AND Rancgs oF Test Scorgs (N = 50)* 
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*All are raw scores ome Porteus Maze which are mental age scores. 
**Measures based upon obtained scores divided by stages completed. 


Norse: On the WCST 11 subjects did not successfully complete any categories, therefore it was not 
possible for them to commit a perseverative error. 


The means, medians, standard deviations, and ranges of scores are presented in 
Table 1. The intercorrelations of scores from the 5 tests and the added variables of 
age and education are presented in Table 2. Split-half reliabilities for the Raven 
Progressive Matrices and the PMA Reasoning (untimed), according to the Spear- 
man-Brown Prophecy Formula, are .91 and .93 respectively. 


Tasie 2. INTERCORRELATIONS (N = 50) 





1 2 3 4 5 





Digit Symbol .55 .48 .62 
Porteus Maze .60 .50 
PMA Reasoning aera 


PMA Reasoning (untimed) 
Pro ive Matrices 
ps T: Categories Completed 


Education 
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For 48 degrees of freedom and a one-tailed test of significance: .05 = .238, .01 = .329. 


On the Digit Symbol all subjects are below the mean scaled scores for ages 16-44 
in the WAIS standardization. The mean scaled score (2.40) is the same as that of 
the male sample 75 years and over in the Kansas City old age standardization of the 
WAIS, and is not significantly different from the female sample of the same age 
group’). This suggests that the present institutionalized sample does not differ in 
intellectual functioning from a non-institutionalized sample drawn from the com- 
munity. 
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The PMA Reasoning test was standardized on an 11-17 age group. On this test 
96% of the aged fail to equal the performance of the average 11 year old. When the 
test is given untimed the performance of the aged improves. However, the average 
11 year old taking the PMA Reasoning test with a time limit still exceeds 88% of the 
aged working with no time limitations, clearly demonstrating that lack of speed is 
not a major cause of the inferior performance of the aged on this test. On the Raven 
Progressive Matrices the average elderly subject performs like the average eight 
year old in the standardization population“. All but 1 of the 50 subjects are below 
the median scores for ages 13-25. 

Standardization data are lacking for the Wisconsin Card Sorting Test. How- 
ever, to make but one comparison, in a study by Fey of subjects 18-34 years of 
age, 39 of 47 normals attained the criterion of 6 complete categories compared to 
only 2 of 50 in the present old age group. 

The Porteus Maze yielded a broad range of scores, discriminating better among 
the elderly subjects than any of the other tests. Although half of the subjects have 
mental ages below 10 years, 16% have mental ages of 15 years or better. The limited 
standardization of the Porteus Maze, the low ceiling for young superior subjects, 
the all-or-none method of scoring, and the slight alteration in administrative pro- 
cedure (placing an arrow at the exit) may be factors responsible for the comparatively 
better performance of several of the subjects on this test. It may also be that fore- 
sight and planning capacity are intellectual functions which are less affected in some 
individuals by the aging process than abstract reasoning and perceptual-motor 
functions. 

Although on all tests there is a wide range of scores, one is impressed with the 
generally inferior performance of the aged. Practically no elderly subject performs 
as well as the average adolescent and young adult, except on the Porteus Maze. The 
present findings are consistent with the observation of other investigators that old 
people have difficulty with tests of intellectual functions that involve the comprehen- 
sion of new ideas and the adoption of new work methods“. Elsewhere, the authors“ 
have attempted to relate this impaired functioning to neuropathological processes 
associated with aging. 


SUMMARY 


This study presented data on the performance of 50 females 74-80 years of age 
on 5 non-language tests of intellectual functions: the Digit Symbol subtest of the 
Wechsler Adult Intelligence Scale (WAIS), the Porteus Maze, the Primary Mental 
Abilities (PMA) Reasoning test, the Raven Progressive Matrices, and the Wisconsin 
Cart Sorting Test (WCST). On all tests there was a considerable range of scores, but 
almost none of the aged performed as well as the average adolescent and young 
adult, except on the Porteus Maze. 
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USE OF THE AMMONS FRPV WITH THE LONG-TERM 
CHRONICALLY ILL* 


DAVID M. STERNE 
VA Hospital, Vancouver, Washington 


PROBLEM 


The Ammons Full-Range Picture Vocabulary Test (FRPV)“ appears to have 
promise for use with chronically ill, long-term patients and especially those with 
expressive aphasic disorders, as a means of evaluating remaining intellectual effec- 
tiveness and responsiveness to verbal communication. Alternate forms of the test 
are provided, and the test can be given to anyone able to point or to signal agreement 
or disagreement as the examiner points to test materials. 

Prior to its use with aphasic patients it appeared desirable to evaluate the 
FRPV under conditions permitting more clear-cut results. The test was therefore 
employed with hospitalized, chronically ill individuals who, apart from the handicap 
of aphasia, were similar in general character and were receiving treatment similar 
to that accorded the aphasic patients for whom the test was being considered. Of 
particular interest were measures of reliability of the two parallel forms of the test 
and of validity, here defined as correlation of Form A with the Wechsler Adult In- 
telligence Scale (WAIS) @?. 


METHOD 


The subjects were 60 male patients in a Veterans Administration general medical 
and surgical hospital, ranging in age from 36 to 86 with a mean of 65.7 years. Both 
forms of the FRPV were administered with half of the subjects receiving Form A 
first and half being given Form B initially, the period intervening between adminis- 
trations ranging from one day to three weeks. For several reasons it was possible to 
give the WAIS to only 54 of the subjects. 


RESULTS 


Correlating the FRPV scores for Forms A and B, an r of .94 was obtained, com- 
parable to findings reported when the instrument has been used with a wide variety 
of other groups“). Correlations between FRPV scores on Form A and WAIS Full 
Scale, Verbal Scale, Performance Scale and subtest scores are presented in Table 1 
as are mean weighted scores and IQ’s. 


TaBLE 1. CorRELATION oF FRPV Scormws on Form A with WAIS Scorzs, 
AND Mzan WAIS Scorszs anv IQ’s 


Scores M IQ 
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*The writer wishes to express his appreciation to Lucius Forbes and Beverly Sonoda, graduate 
students at the University of Oregon, and Paul Resta, graduate student at Washington State College, 
for their assistance in collecting the data. 
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1Q’s estimated from FRPV scores and adjusted for the effects of aging! cor- 
related .85 with the obtained WAIS Full Scale I1Q’s and .86 and .73 with WAIS 
Verbal and Performance 1Q’s. FRPV mean IQ’s of 97.9 for Form A and 98.6 for 
Form B provided a slight under-estimate of WAIS Full Scale 1Q’s (M 99.9). The 
FRPV appeared to correlate as highly with the WAIS Full Scale score as did any of 
the subtest scores. 

The FRPYV itself was generally well-received by the subjects, who showed much 
less evidence of fatigue and resistance with it {han with the WAIS. Among the fact- 
ors to be considered in the use of the FRPV are the short time required for its ad- 
ministration, its impersonal and intrinsically interesting pictorial form, and the 
degree of challenge which it presents with some opportunity for success for almost 
any subject. These characteristics would seem important in any technique con- 
templated for use with subjects particularly sensitive to the effects of fatigue, dis- 
comfort, sagging morale, and chronically lowered physical reserves. 


SUMMARY 
The two forms of the Ammons Full Range Picture Vocabulary Test were ad- 
ministered to 60 male patients with long-term, chronic illnesses and the Wechsler 
Adult Intelligence Scale was used with 54 of these. Correlation coefficients of .94 
were found between the two forms of the FRPV and .84 between Form A and the 
WAIS Full Seale Score. The use of the test with long-term chronically ill patients was 
briefly discussed. 
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Montana: 


1Adjustment for the effects of aging was made on the basis of smoothed curves extrapolated from 
the scaled score equivalents of WAIS Vocabulary raw scores for different age groups ° PP. 101-110) 


PERFORMANCE OF SUSPECTED AUDITORY MALINGERERS ON 
THE SUBTLE-OBVIOUS KEYS OF THE MMPI 


FRANK KODMAN JR., GORDON SEDLACEK, AND ERNEST MCDANIEL 
University of Kentucky 


PROBLEM AND PROCEDURE 


Kodman, Sedlacek and Powers“? found a significant difference between the 
MMPI profiles of a control sample of hard of hearing non-malingerers equated with 
a sample of suspected auditory malingerers, referred for medico-legal examination of 
their hearing. Each of the subjects was screened for psychiatric disorders, organic 
brain damage, sensory aphasia, mental retardation and ear disease by a team of 
specialists. The suspected malingerers exhibited extreme variability of their audi- 
tory responses on a battery of hearing tests. The control group did not exhibit other 
than normal psychophysical response variability. 

An item analysis of the standard 566 MMPI items showed a total of 68 which 
separated the groups at the .05 level. A cross validation study of these items is 
currently underway on a new sample of suspected auditory malingerers. The critical 
items were analyzed statistically using the Lawshe and Baker®) nomograph for 
computing the significance of differences between percentages. 

The numbers of the items in the booklet were 2, 9, 30, 67, 79, 103, 147, 155, 
178, 185, 187, 198, 221, 265, 285, 383, 400, 429, 486, 495, 524, 539, 552 and 563 and 
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were scored as false by the suspected malingerers. Items 11, 14, 22, 29, 40, 48, 69, 
87, 104, 121, 156, 157, 159, 184, 197, 204, 211, 251, 256, 263, 266, 268, 279, 297, 320, 
321, 325, 326, 344, 345, 354, 356, 360, 363, 370, 373, 378, 393, 413, 441, 447, 456, 514 
and 543 were scored as true by the suspected malingerers. 

It seems cogent to examine the test taking attitude of our suspected malingerers 
so as to afford greater insight into the personality dynamics influencing malingering. 
Wiener “) has reported data on two sets of MMPI items which differ markedly as to 
the apparent interpretation of the subtle or obvious content underlying the items. 
They were classified as subtle or obvious by a group of clinicians and were drawn 
from the D, Hy, Pd, Pa, and Ma scales. In addition to the item analysis, we were 
interested in a comparison of our two groups on the Subtle and Obvious keys to 
determine if they would discriminate between our malingerers and nonmalingerers. 


REsULTs 

The analysis of variance summarized in the upper half of Table 1 shows that our 
two groups do not differ significantly on the Subtle key, however, there is a signi- 
ficant difference at the .01 level between the responses of the groups on the five sub- 
scales. There is, however, a non-significant interaction between subjects and sub- 
scales (G X 8). In view of the non-significant interaction, the statistical difference 
between subscales is of little importance. The items in each subscale differ in number 
and thus contribute unequally to the variance. The interaction also indicates that 
test taking attitude was invariant over subscales. 


TaB.eE 1, ANALYSIS OF VARIANCE OF THE SUBTLE AND Osvious Kny RESPONSES 
BETWEEN SUSPECTED MALINGERERS AND NORMALS 








Source df Mean Square F 
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Subscales (8) 4 
G xs 4 
Error 170 
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Obvious Key 
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*p O01 


The analysis of variance in the lower half of Table 1 indicates that the suspected 
malingerers and normals differed significantly at the .01 level in their responses to 
the Obvious items. There was also a significant difference between the subscales at 
the .01 level. Inspection of the subscale means showed that the malingerers scored 
higher than the normals on the Obvious items. The suspected malingerers also 
showed a greater tendency to score the items in the pathological direction. 

There is further support for the hypothesis that our malingerers present a more 
consistent neurotic personality pattern than do the normals. From an analysis of 
the original MMPI subscales which contribute to the Obvious key, it was noted that 
49.3% of the 146 items were taken from two of the subscales (D, Hy) which comprise 
the MMPI neurotic triad; whereas, 31.5% were selected from two of the subscales 
(Pa, Ma) which comprise the MMPI psychotic grouping. The remaining 19.2% 
were chosen from the Pd subscale. Since the Pa subscale showed no significant 
difference between the two groups, the contribution of the psychotic subscales is 
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further minimized. The Ma subscale contributed only 15.8% of the 146 items. Thus 
the majority of the items came from the subscales which are indicative of psycho- 
neurosis. This type of analysis cannot be performed on the Subtle items since there 
was neither consistent nor significant differences between the two groups. 

A further analysis of the subscale performances on both the Subtle and Obvious 
keys is reported in Table 2. The ¢ tests show non-significant differences between the 
individual subscales of the Subtle key and significant differences between the in- 
dividual subscales of the Obvious key except for the Pa (paranoid) subscale. In 
other words, the malingerers did not differ from the normals on the subscales of the 
oo key, but did differ significantly on four of the five subscales of the Obvious 

ey. 
Taste 2. SratisticaL ANALYSIS OF THE SUBTLE AND Osvious SUBSCALES 
BETWEEN MALINGERERS AND NORMALS 








t Tests 
Subscales Subtle key Obvious key 
D 1.09 3.38 
H 1.45 3.38 
Pa 48 2. 
Pa 1.01 1. 
Ma .74 2. 


t, .05, 17 df 2.11 
t, 01, 17 df 2.90 








SUMMARY 
Two groups of hard of hearing adults, one group suspected auditory malingerers 
and the other a control group, were compared by means of the Subtle and Obvious 
keys of the MMPI. The groups performed similarly on the Subtle key, but differed 


significantly on four of the five subscales of the Obvious key. The significant sub- 
scales were the D, Hy, Pd, and Ma. The Pa (paranoid) subscale did not discriminate 
between our two groups. It was argued that the malingerer performs as if he were 
psychoneurotic and not psychotic. An item analysis revealed a set of 68 MMPI 
items which may serve as a preliminary scale for detecting auditory malingerers. 
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EDITORIAL OPINION (continued from page 223) 


intensive case study. We need to develop a course designed at as “psychological diag- 
nosis” directly comparable to the medical course “‘physical diagnosis’ in which the 
student is trained to use all his senses in gathering all the evidence which can be 
secured from direct examination methods. Such a course is a basic requirement for 
developing clinical ability but is now almost nonexistent in our training programs. 

We challenge so orn sa with the proposition that they have an ethical 
responsibility to broaden their own experience and to validate their competence as 
psychologists by systematically testing their predictions in actual clinical situations. 
When is a psychologist not a psychologist? When he never deals with the total 
person interacting with his environment. 

W.A.H. 


F. C. T, 





PICTORIAL REPRESENTATIONS OF SITUATIONS 
INVOLVING THREAT! 


GEORGE J. WISCHNER AND ALBERT E. GOSS 
University of Pittsburgh University of Massachusetts 


PROBLEM 


In a previous study Wischner“? obtained information about stutterers’ re- 
actions to speaking by asking them to “draw whatever you think most adequately 
represents your behavior immediately before, during, and after a moment of stutter- 
ing.”” Analysis of the cycle of behavioral events depicted in the drawings, which 
were often elaborated by accompanying spontaneous verbalizations, indicated that 
anxiety (fear, tension, stress) was a central component in these events. The typical 
pattern was one of anxiety-instigation in the ‘‘before’’ period which increased during 
the ‘‘attempt-to-speak’”’ phase, and then fell off markedly in the “after” period. In 
general, this anxiety process, as revealed in stutterers’ drawings, was consistent with 
experimental findings reported by Goss“: *) from which temporal gradients of 
anxiety in stuttering behavior were inferred. 

Mowrer, “? Schilder®?, and other research “. 7) with stutterers suggest that the 
anxiety cycle surrounding the stuttering moment should hold for any situation 
which individuals consider threatening. Therefore, the primary objective of the 
present study was to determine whether the sequences of reactions in stutterers’ 
drawings also occurs in normal speakers’ drawings of their reactions to situations 
and activities involving threat. For one group of normal speakers the activity was 
making a speech in a freshman speech class. Drawings of members of a second group 
represented their reactions to any unpleasant situation; those of a third group de- 
picted their reactions to some specific situation. A secondary objective was the 


further assessment of research and clinical uses of drawings as a technique for ob- 
taining information about sequences of reactions to successive, connected materials, 
particularly those involving threat. 


SUBJECTS AND PROCEDURE 


The 99 Ss were drawn from among larger numbers of students in (a) sections of 
a University freshman speech course from whom data were obtained during the last 
week of the semester, and (b) a psychology of adjustment course who were tested 
early in the semester.? Each section or course was handled as a group. Instructions 
to the speech students were: ‘“‘Draw whatever you think most adequately represents 
how you feel or what you would like to do immediately before you start to give a 
speech, when you start to speak, and finally, when you have finished your speech. 
Do the best you can.” 

Psychology students were instructed: ‘Imagine that you are faced with an 
unpleasant situation, a situation you cannot avoid, but must enter. Make a drawing 
which most adequately represents your behavior immediately prior to entering the 
situation, while you are in the situation, and after you have completed or are out of 
the situation.’’ Following the completion of their drawings, these Ss were asked to 
indicate on the back of their sheet what situation, if any, they had in mind as they 
made their representations, and what emotions they experienced or were illustrated 
in their drawings. 

The drawings of three samples of 33 Ss each were analyzed. One sample con- 
sisted of the speech students. A second sample was made up of psychology students 


1These data were reported at the New York meetings of the American Psychological Association, 
September, 1957. . : 

*Miss Sue Bageant was responsible for gathering the drawings of the speech students who were 
made available through the courtesy of Miss Doris Abramson. Miss Joan Bopp assisted with the data 
obtained from the psychology students. 





PICTORIAL REPRESENTATIONS OF SITUATIONS INVOLVING THREAT 197 


whose drawings were with respect to a specific unpleasant situation and the third 
sample consisted of psychology students whose drawings were with no specific un- 
pleasant situation in mind. 


RESULTS 


The results may be introduced best by offering typical examples of actual draw- 
ings included in the three samples. Figs. 1, 2 and 3 present three drawings, one 
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Fia. 1 — student’s representations depicting reactions to having to make a classroom speech. 


e drawings were made by a freshman male student. 





Fic. 2 Psychology student’s representations of reactions to an unpleasant situation. This student 
stated she had no specific situation in mind. 
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Fic.3 Psychology student’s representations of reactions to an unpleasant situation. 


selected from each of the samples. These drawings are adequately representative of 
the samples from which they were selected and reflect, in turn, the character of the 
larger number of drawings originally obtained. They illustrate clearly the way in 
which graphs were employed by some Ss and changes in facial or bodily features by 
others. That they vary in degree of complexity, level of abstraction, amount of 
verbal elaboration, and other aspects should be readily apparent. 

Categories were developed for analyzing the drawings both for content and pro- 
cesses illustrated. Table 1 summarizes the content analysis and shows the number of 
drawings falling into each content category for each of the three samples. The 
assignment of drawings to categories employed was on the basis of agreement be- 


TaBLze 1. ConTENT AND Process ANALYSIS OF DRAWINGS FOR EACH OF THE THREE SamMPLes (N = 33). 
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tween the authors. An independent study with the drawings of the speech students 
revealed that they could be classified with high interjudge agreement. Because of 
possible differences in the ‘‘populations” from which the speech and psychology 
students were drawn, the absolute numbers in this table should not be accorded too 
much importance. But it is interesting to note that the subsamples of psychology 
students from the same “population” appear to be quite similar with respect to 
frequency of utilization of the various content categories indicated. The data for the 
speech students are seemingly different from the two psychology groups. For 
example, none of the speech drawings fall into the last category, ‘enactment of a 
situation”. This is probably due in large part to the fact that the situation for this 
group—making a speech before a class—was clearly specified, and this condition 
undoubtedly affected the figures for the other categories. The relatively infrequent 
use of graphic and other geometric representations by the speech students, who were 
freshmen and had had no psychology, may reflect a difference in some kind of 
sophistication factor in this group as compared with the psychology groups. 

The processes or cycles of events represented in each drawing were also categor- 
ized in Table 1 showing the number of drawings in each of the three samples that 
illustrated each process. Here it is evident that the values for the three samples are 
highly similar. The most frequent category by far is that of ‘tension or fear-release.”’ 
What was classified in the ‘‘other” category is of some interest. In these drawings, 
there was often no representation of an “after’’ period. Evidently for some Ss the 
problem really never gets solved, the situation is never adequately handled. There is 
confusion, conflict, indecision, sorrow, defeat, and introspection concerned with the 
adequacy with which situation was handled. In some instances S merely said of the 
termination of the situation, “I don’t know.” 

Of some interest, although it occurs relatively infrequently, is the third process 
category which ends with “more fear.”’ It is to be noted that in such instances some 
release is usually included, but the rearousal of fear seems to be related either to the 


anticipation of again having to go through the same or a similar situation or to retro- 
spective thoughts of what might have been. 

In Table 2 are offered examples of the descriptive language accompanying many 
of the drawings. Similar terms were used by psychology students in replying to the 
question asking what emotions were represented in the drawings. Again the draw- 
ings by the speech students differed from those of the others in that they showed less 
tendency to employ language. 


Tasty 2. Exampies or Lancuacs Empioyep By Ss IN THEIR REPRESENTATIONS OF THE BEFORE, 
Durine, aNnp ArrerR Periop oF MAKING * Crassroom SPEECH OR Factiva ANY UNPLEASANT 
ITUATION 





Before During After 





Scared stiff Hang onto something for dear life ZZZZZZZ! ! 
Worry, think Do best and get it over with Tired but relieved 


Fear Pseudopoise. Have to go through with it. Defeat—Damned if I 
know, Confusion — Suc- 
cess or triumph 


Peace 


Thought Crisis Pleasant feeling 


Nervous, can’t be quiet Sit down and do what one has to Completely worn out 
both physically and men- 
tally—but glad it is over 

Nervous, sweating, tense Confused, all messed up Relaxed, smiling, not care 


Fearful World shattered Despair, hate, sense of 
failure 


Anxiety Apprehension Elation 
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SUMMARY 


Of primary interest was the question of whether sequences of reactions ob- 
served in stutterers’ drawings of their behavior before, during, and after a moment 
of stuttering would also occur in normal speakers’ drawings of their reactions before, 
during, and after threatening situations and activities. Three samples of 33 drawings 
each were analyzed. One consisted of drawings by freshmen speech students who 
made portrayals of the before, during, and after stages of a classroom speech. Psy- 
chology students in the other two samples were instructed to imagine an unpleasant 
situation which could not be avoided and to draw representations of their behavior 
before entering, during, and following termination of the situation. Ss also later 
described the specific situation, if any, to which the representations referred. Thirty- 
three records involving a specific situation made up one sample; thirty-three not 
involving a specific situation constituted the other sample. 

Drawings were analyzed for content, process represented, and spontaneous 
verbal descriptions used. As was the case with stutterers, drawings in this study 
demonstrated a markedly similar process in most Ss. There was a cycle of events 
involving progressively increasing anxiety followed by a reduction in this state with 
termination of the situation. Content employed included graphic representations, 
facial changes, stick figures and overcoming obstacles. Spontaneous verbalization 
in the before and during stages included most often “fear,”’ “pressure,” ‘‘nausea,” 
and “apprehensiveness.’’ Recurring in the after period were “‘relief,’”’ ‘calm,’ “re- 
lieved of burden.”’ 
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WORD-ASSOCIATION AND THE DRIVE HYPOTHESIS OF ANXIETY! 
FREDERICK H. KANFER 
Purdue University 


PROBLEM 


The word-association technique has been widely used in clinical practice but 
there have been few attempts to explore the general determinants of the responses. 
Recent research on verbal behavior indicates that in addition to S’s unique associa- 
tion patterns, his current anxiety may also become a determinant of his verbal out- 
put“: 2), Application of Taylor’s anxiety drive hypothesis“ to the verbal associa- 
tion experiment suggests that the emission of idiosyncratic responses may be due not 
only to blocking or repression but also may be related to the differential associative 
probabilities of several responses following a common verbal stimulus. This ap- 


‘This study was supported in part by Research Grant M-2027 from the National Institute of 
Mental Health, U. 8. Public Health Service. 
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proach would predict that infrequent responses may be given to some words, not 
only on the basis of their content but also on the basis of the availability of responses 
for this stimulus word. The purpose of the present study was to examine Taylor’s 
oi of differential performance of high and low anxious Ss on a word associa- 
tion task. 

Taylor’s formulation for prediction of performance on a learning task states 
that: (a) In a simple non-competitional situation, if the stimulus is associated with 
only a single response of high probability of occurrence, then high-anxious Ss should 
yield a higher frequency of this response than low-anxious Ss. (b) If several responses 
have approximately equal probabilities of occurrence in the presence of a stimulus, 
then highly anxious Ss should yield a lower percentage of the correct response than 
low-anxious Ss. 

Since these predictions for correct and incorrect responses in learning tasks are 
based on the probability values for their occurrences, they should be directly ap- 
plicable to a word-association procedure. As in a learning task, a stimulus word 
represents a ‘‘simple, non-competitional situation’’ when any one response to it has 
a considerably higher probability of association than any other. Similarly, the low 
probability of a given verbal response is equivalent to Taylor’s instance of a weak, 
correct response when many competing incorrect responses exist. Thus, according 
to Taylor’s second hypothesis unusual responses in the word-association test may 
result from momentary strength of a verbal response which ordinarily has a low 
probability of following the stimulus. Therefore, in anxious Ss, response-competition 
would represent an important determinant of the association in addition to personal 
factors. 


METHOD AND PROCEDURE 


Stimulus words were selected from the tables of word association probabilities 
by Russell and Jenkins“) based on 1008 Minnesota College students. Prior to the 


present study a preliminary sample of associations was obtained from 175 Ss and it 
was found that the Russell and Jenkins norms were closely approximated in the 
University population from which Ss for the present study were drawn.” 

The stimuli for the present list were selected on the following basis. All stimuli 
were in Thorndike and Lorge’s“’? AA class. The words were also selected according 
to the probability of occurrence of the most frequent responses. On this basis two 
groups of fifteen words were selected. In the first group the most common response 
to each stimulus had a p value between 0.52 and 0.83 and by far exceeded the p value 
of the next common response (p between 0.04 and 0.21). These stimuli will be called 
“high conformity” words and represent the non-competitional situation described 
in the first hypothesis above. The second group consisted of stimulus words to which 
the most frequent response had a p value between 0.12 and 0.26 and the next com- 
mon response had a p value between 0.10 and 0.24. These 15 stimuli will be called 
the “low conformity’ words. The expected degree of response competition is de- 
rived from the difference in the probability of the first and second response. The high 
conformity stimuli were further subdivided into three levels, according to the 
absolute probability values for the first response. The resulting groups of five words 
will be called conformity levels (C). Ci, C2, Cs, Cu, Cs, Ce, represent classes of stimuli 
which show great differences in the p values of the first and second responses and also 
differ in the absolute p values of their most frequent response. In the low conformity 
group a similar subdivision yielded levels C,, Cs and Cs. The subdivision permits 
further analysis of responses, within the two groups as a function of the absolute p 
values of the most frequent response. The stimuli are presented in Table 1. 

For each 8 a conformity score was computed for each level, for the combined 
high and low conformity levels (¢.e., for C; plus C, plus Cs, and for C, plus Cs and 
plus Cz) and for the entire list of selected words. The score was obtained by summing 
S’s frequencies of congruence with the most common response on the Minnesota 


*The data were collected while the author was on the faculty of Washington University. 
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TaBLe 1. Worp Stimv.i anv p VaLun or Tuer First anp SEconp ResPoNnsE 
(BasED oN THE Minnesota Norms) 








Stimulus Ist 2nd Stimulus Ist 2nd Stimulus 1st 2nd 
response response response 








Non-competitional Associations 
Cz 


hard .67, 
bread .61, 
woman . 64, 
light . 65, 
butter . 64, 








Highly Competitive Associations 
Cs 





hand a ; music ‘ : comfort 
river ‘ é window ‘ 5 wish 
foot . P blue ‘ ‘ memory 
red . , moon 2 4 head 
green ‘ d beautiful : : child 








norms. According to the experimental hypothesis, Ss with high Manifest Anxiety 
Scale (MAS) scores were expected to yield higher conformity scores on levels C,, 
C; and C; but lower scores on C,, C; and C, in comparison with low MAS scorers. 

The Russell and Jenkins modification of the Word Association Test) was ad- 
ministered to 335 students of both sexes in Elementary Psychology classes. For half 
of the Ss the list was reversed in its order of word presentation. The test was paced 
by E who read each stimulus and allowed 5 seconds for recording a response. After a 
one-minute rest period Ss completed the MAS at their own pace. 


RESULTS 


The total population (N = 335) yielded a mean of 14.65 (S.D. = 7.55) on the 
MAS. The median score was 13.36. These findings approximate the distribution 
reported by Taylor®. Ss were selected from the extremes of the present distribu- 
tion. 54 Ss with scores between 21 and 29 (above 80th percentile) made up the high- 
anxiety (HA) group. The low-anxiety (LA) group consisted of 54 Ss whose scores 
fell between 4 and 8 (below 20th percentile). 


Tasie 2. Mean Conrormity Scorzs 
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C, Mean 
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The mean conformity score on all 100 stimulus items for the HA group was 
10.38 (S. D. = 1.52) and for the LA group was 11.11 (8. D. = 1.62). The difference 
is not significant by t-test (p < .05). Table 2 presents the mean and standard de- 
viations of the conformity scores for the groups on the six conformity levels. An 
analysis of variance on conformity scores (Table 3) reveals no significant difference 


TasLp 3. ANALYSIS OF VARIANCE ON CoNFORMITY SCORES 





Source df MS F P 


HA vs. LA 1 3. 0. 

Se within groups 106 3. 

Conformity Levels 5 = 1 %. <.01 
1. 





Levels X groups 5 
Se within L x G 530 





between high and low scoring Ss on the MAS. The interaction between groups and 
levels also fails to reach significance, indicating that the HA and LA groups did not 
respond differentially on the various levels, as predicted by the Taylor hypothesis. 
Thus, Ss in the HA and LA groups yielded approximately the same conformity 
scores over the entire range of levels. The two groups also revealed no significant 
differences in their performance on stimuli which have several highly competitive 
responses associated with them and on those to which only one response is highly 
probable. The significant effect of conformity levels (F = 82.46) indicates that the 
stimuli clearly differed in their tendency to evoke the common response given by the 
Minnesota norms. 


DiIscUssION 
The present findings suggest that the contribution of 8’s anxiety level in de- 


termining responses in the word association test is not sufficiently great to account 
for the response mainly in terms of the probabilities of association to the stimulus. 
In fact, idiosyncratic hierarchies of associations appear to influence the responses in 
all Ss. Therefore, individual deviations from group norms appear to be rather com- 
mon. In order to test whether there is a general tendency for high MAS scorers to 
conform more to the group norms on the total word association list, r’s were run 
separately for groups HA and LA. Neither r differed significantly from zero. These 
results suggest that MAS scores and conformity to group norms were not related in 
the present study. 

The relationship of the conforming tendency on the Kent-Rosanoff and on other 
verbal tasks has been explored by Peterson and Jenkins®? in a pair of Ss selected on 
the basis of high and low communality (conformity) on the Word-Association Test. 
The authors suggest that the communality variable may be related to other char- 
acteristics of verbal behavior, such as consistency, associative fluency, etc. Consider- 
ing the present finding, it appears that those personality characteristics which are 
said to be assessed by the MAS do not overlap with those manifested by conformity 
on the Word-Association Test. 


SUMMARY 

The study investigated the relationship between MAS scores and performance 
on the Word-Association Test. Specifically, it was predicted from Taylor’s formula- 
tion that high-anxious (HA) Ss would yield more common responses when the word 
stimulus has a single highly probable response and fewer common responses when 
many equally probable responses are available, than low-anxious (LA) Ss. 

From the Kent-Rosanoff list 30 stimuli were selected which varied in the proba- 
bility with which the most common response occurred. The responses of 54 Ss who 
scored above the 80th percentile on the MAS and 54 Ss who scored below the 20th 
percentile in a population of 334 college students were analyzed. Frequency of con- 
gruence with the Minnesota norms yielded a conformity score for each 8, on each 
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set of stimuli with high and low probability of response. An analysis of variance 
failed to show significant differences on conformity scores between the HA group and 
the LA group. Furthermore, no interaction between MAS scores and response 
probability was found. The results failed to support Taylor’s hypothesis and suggest 
that drive level does not significantly affect conformity to common word associations. 
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TEACHER’S ATTITUDE AND CHILDREN’S PERFORMANCE ON THE 
BENDER GESTALT TEST AND HUMAN FIGURE DRAWINGS 


ELIZABETH MUNSTERBERG KOPPITZ 
Endicott, N. Y. 


PROBLEM 

In trying to understand and predict a pupil’s behavior and achievement in 
elementary school we must not only consider his endowment and home environment 
but also his teacher. The present study explores the effect of the teacher’s attitude 
on the performance of the Bender Gestalt Test and Human Figure Drawings by first 
graders. A survey of the literature shows that both the Bender Gestalt Test and 
Human Figure Drawings have been used in two different ways. Both tests are be- 
lieved to reflect emotional factors as well as maturation ii: the visual motor area. 
Thus Bender“: »- "?) and Goodenough “: »- ) approach these tests strictly from the 
developmental point of view while Hutt, Machover“*) and Levy“) use them as 
projective tests in the process of diagnosing patients. Research findings ®: 4: 5. 2. 4, 16) 
seem to offer support for both methods of approach in using the Bender and Human 
Figure Drawings. However, it may be expected that the obtained results would 
differ depending on the method of interpretation used. In this paper both ap- 
proaches to the Bender and Figure Drawings will be considered. 

Four hypotheses will be tested in this study. In the first two hypotheses the 
Bender and Human Figure Drawings are considered as developmental tests which are 
primarily the result of maturation and therefore should be influenced but little by 
the emotional state of the child. It is predicted therefore that the teacher’s attitude 
will have no significant effect on the performance of the Bender Gestalt Test by first 
graders as measured by the Koppitz scoring system. It is further hypothesized 
that the teacher’s attitude will have no significant effect on the Human Figure 
Drawings of first graders as measured on the Goodenough Scale. 

In the last two hypotheses the Bender and Figure Drawings are approached as 
projective tests and are scored for emotional factors. It is hypothesized in this study 
that the attitude of a driving, authoritarian, and restrictive teacher will be reflected 
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in a high incidence of constriction, tension, and striving indicators on the Bender and 
Human Figure Drawings of her students. On the other hand it is predicted that an 
easygoing, permissive, and warm teacher’s attitude will be reflected in a low incidence 
of tension indicators on the Bender and the Human Figure Drawings. 


METHOD 


Subjects. The Ss for this study were taken from two first grade classes in the same 
elementary school. Thus all Ss were drawn from the same general neighborhood and 
had very similar backgrounds. The most distinguishing factor between the two 
classes were the teachers. 

In Class A the teacher was middle aged, unsmiling, tense, rather rigid, very 
conservative and suffered from a slight physical handicap. She maintained strict 
order in her room. Any noise or confusion was upsetting to her. To quote some of her 

students ‘spontaneous’ comments: “‘. . . when we talk in class our teacher gets 
mad...”, “... she gets mad so easily, every day she keeps someone after school 
—s in school I do what the teacher says or else she starts yelling, but at home 
I do as I please . None of the students voiced any spontaneous affection for the 
teacher and several indicated that they did not like school. Yet this teacher was very 
conscientious and worked hard. Her chief concern appeared to be the teaching of 
school subjects, she appeared to have less interest in the individual child. Above all 
this teacher enjoyed the teaching of reading. The author observed Class A on three 
different occasions, each time the students sat very quietly in their seats and worked. 
Even several minutes before the bell rang, following the noon recess, there was hardly 
a sound to be heard; each child sat at his desk, there was little interaction between 
students. 

Observation in Class B showed a marked contrast. When the author entered 
the room a few minutes before the bell rang she found herself in the midst of a happy 
confusion. Children were running about, they were talking, laughing, playing. The 
teacher, also middle aged, sat smiling at her desk. She was completely relaxed and 
spoke warmly to the pupils who approached her. When the bell rang each child went 
to his desk, settled down quickly and without delay the group seemed eager and 
ready to begin work. To the casual observer this class presented a happy, relaxed 
picture where both teacher and students enjoyed each other and enjoyed working 
together. The teacher’s chief concern seemed to be the children. In individual inter- 
views later on several Ss spontaneously told the author of their fondness for the 
pew and their love for school. None expressed any fear or dislike of teacher or 
school. 

Class A and B varied somewhat in the number of repeaters and of very immature 
and young children. Since both the Bender and Figure Drawings are influenced by 
age no child repeating the first grade was included in this study. Only those children 
were included who had entered the first grade in the fall and who had taken the 
Metropolitan Readiness Test at the beginning of the school year, that is, prior to 
being exposed to the respective classroom teachers. It was possible to match 8 boys 
a 8 girls from each class on the basis of age and the total Metropolitan Readiness 

est score: 





Mean Range 
Class Boys Girls Mean Age Age Range Met. R. T. Met. R : A 





A 8 8 6 ys. 11 mo. 6 ys. 6 mo.-7 ys. 5 mo. 77.5 57-88 
B 8 6 yrs. 11 mo. 6 ys. 6 mo.-7 ys. 6 mo. 76.5 54-87 





Procedure. During the last month of school the author administered the Bender 
Gestalt Test to each S individually. At that time each S was also asked what he liked 
to do best in school. Within a week of this testing session each classroom teacher 
administered the Human Figure Drawing test to her class as a whole. Following the 
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written instructions provided by the author, the Ss were asked to draw ‘“‘a whole 
person’’. All test protocols were scored by the author blindly, that is, without her 
knowing into which group the individual test paper belonged. The Bender protocols 
were first scored according to the Koppitz system and then again for indicators of 
tension and perfectionism “. These indicators were defined as follows: 
1. Constriction. All 9 Figures crowded into less than half of the sheet of paper; all 9 Figures 
strung along one edge of the paper. 
2. Two or more erasures. Part or all of two Figures erased and redrawn. 
3. Second attempt. After partially or wholly completing a Figure a second drawing of the same 
Figure is added spontaneously. 
4. Lines substituted for circles in Fig. 2. Lines must be at least 1/16” long; dots or filled in circles 
are not scored for this point. 


The Human Figure Drawings were first scored according to the Goodenough 
system and then again for indicators of repression, tension, striving, and hostility as 
described by Buck ®), Levy), Machover“*), Koppitz ®, and others. They include: 

1. Large head. Head larger than body suggesting striving for or concern with intellectual achieve- 
ment. 

2. Excessively long neck. Suggesting restraint, concern with control over impulses, inhibition. 

3. Tiny drawing. Figure less than 2” tall suggesting anxiety, shyness or tendency to withdraw. 

4. Teeth. Suggesting hostility, resentment, indirect aggressiveness. 

5. No Pupils turned sideway, up or down, suggesting suspicion, paranoid feelings, hostility, 
avoidance. 


And finally the expressed favorite activity of each S was tabulated. Thereafter the 
test scores for Class A and Class B were compared. 


REsvULTS 

Favorite activity in school. In Class A 13 of the 16 Ss gave reading as their first 
choice. Of the remaining three Ss in that Class one was a very bright and talented 
girl who named drawing as her favorite subject with reading a close second. The 
other two Ss had moderately severe speech disorders and found oral reading difficult, 
they preferred seatwork and writing respectively. In Class B 5 Ss chose reading, 7 
Ss preferred writing, one arithmetic, two drawing, one games. A Chi-square com- 
paring the number of Ss in both classes who chose reading with those who chose some 
thing else was found to be highly significant, that is, x? = 8.18, P < .01. 

These findings strongly indicate that the Ss were influenced by their teacher’s 
attitude in their stated preference of school activity. The teacher of Class A put great 
stress on reading and had a marked personal preference for this subject. The in- 
formation gathered from the children in one brief interview is too limited to say with 
any assurance whether they had learned from their teacher a real love for reading or 
whether they had learned to repeat what was expected of them by their teacher. The 
uniformity of choice in Class A seems to reflect at least in part the teacher’s author- 
itarian approach. This impression was increased when even rather dull students who 
were unable to read at all expressed a fondness for reading but were unable to give 
any reason for their preference. The wider range of favorite activities selected by 
Class B may well be the result of greater individual freedom and the more permissive 
attitude of the teacher. 


Bender Gestalt Test. When the Bender protocols were scored according to the 
Koppitz method the following results were obtained : 








Group Mean Bender Score 8.D. Range of Score 
A 4.44 2.91 0 to 10 
B 3.94 2.38 1to8 


— 


Total 4.19 














TEACHER’S ATTITUDE AND CHILDREN’S PERFORMANCE ON THE BENDER TESTS 207 


A Chi-square comparing the number of Ss in both groups whose Bender score fell 
above or below the total Mean score was found to be not significant, that is, x? < 
1.00, P > .20. A é-test yielded also no significant difference between the Mean 
Bender score for Group A and Group B, ¢ = .68. These findings offer support to the 
hypothesis that the teacher’s attitude is not reflected in the number of deviations and 
distortions on the Bender Gestalt Test of first graders. This held true for each 
individual scoring item on the Bender as well as for the Bender Composite Score. 

When the Bender protocols were scored for tension indicators it was found that 
8 Ss in Group A showed one or more such signs while 6 Ss in Class B revealed tension 
signs. A Chi-square comparing the number of Ss in each Group with and without 
tension indicators on the Bender proved to be not significant, that is, x* = 1.00 
P > .20. Thus the hypothesis that teacher attitude is reflected in the number of 
tension indicators on the Bender of first grade students was not supported. 

These findings seem to suggest that the Bender is primarily an indicator of a 
young child’s maturity in visual motor perception and coordination and is but little 
influenced by emotional factors. This may well add to its value as a screening tool for 
school beginners. Further study is needed to see whether the Bender’s sensitivity to 
emotional factors increases once a child has reached maturity in the visual motor 
area. At the first grade level it seems that a test other than the Bender is needed to 
assess a child’s emotional state. 


Human Figure Drawings. The scoring of the Human Figure Drawings according 
to the Goodenough system resulted in the following: 











Mean Range of 
Group Drawing Score 8.D. Drawing Score 
A 21.87 5.19 15 to 33 
B 23.37 4.81 16 to 30 
Total 22.62 





Neither a Chi-square comparing the Ss in both Groups with a score above or below 
the total Mean score nor a t-test comparing the Mean scores for Group A and Group 
B yielded significant results, that is x? = 1.13, p > .20 and¢ = .82. These findings 
support the hypothesis that teacher’s attitude will show no significant influence on 
Human Figure Drawings when scored by the Goodenough method 
When the Human Figure Drawings were scored for emotional indicators of 
striving, inhibition, tension and hostility the following results were found: 
Group A Group B 
Ss with one or more emotional indicators in HFD 15 10 
Se with two or more emotional indicators in HFD 9 3 


A Chi-square comparing the number of Ss in each Group with and without emotional 
indicators on the Figure Drawings proved to be significant, that is, x* = 4.57, 
P < .05. When the number of Ss in each Group with two or more emotional indicat- 
ors were compared the value of x? = 4.80, P < .05. These results lend support to the 
hypothesis that Human Figure Drawings by first graders are sensitive to the teach- 
er’s attitude and will reflect the same in the number of emotional indicators shown on 
the Drawings. 

Thus it appears that Human Figure Drawings are indicators of developmental 
as well as emotional factors in young school children. The usefulness of drawings is 
greatly enhanced when they are scored in more than one way. Further studies are 
indicated to explore more fully the various dimensions of figure drawings by children. 


SUMMARY 


In this study the effect of the teacher’s attitude on the performance of the 
Bender Gestalt Test and Human Figure Drawings by first graders was explored. 





208 ELIZABETH MUNSTERBERG KOPPITZ 


Sixteen matched pairs of students from two different classes in the same school 
served as Ss. The two classes differed primarily in their teachers. One teacher was 
rigid and authoritarian, the other was permissive and relaxed. The observed be- 
havior in the class rooms revealed a marked contrast, and the expressed attitude and 
stated preference of school activities by the Ss in the two classes indicated a signifi- 
cant difference between them. The Bender and Human Figure Drawings were ad- 
ministered to all Ss. Both tests were scored in two ways: (a) according to develop- 
mental, formal aspects measuring maturation in visual motor perception primarily, 
and (b) according to emotional aspects reflecting tension, perfectionism, striving, 
etc. The test scores for the two classes were compared. The findings offer support 
for the hypothesis that the Bender and Figure Drawings of first graders are not 
significantly influenced by the teacher’s attitude when they are scored as develop- 
mental tests. Contrary to prediction the Bender also proved insensitive to teacher 
attitude when the protocols were scored for emotional factors. Human Figure Draw- 
ings on the other hand revealed significant differences between the two classes and 
seem to reflect the teacher’s attitude when scored for emotional indicators. 
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MANIFEST ANXIETY IN PSYCHIATRIC OUTPATIENTS 


M. A. BAILEY F. M. LACHMANN 
M. E, BERRICK D. H. ORTMEYER 


Veterans Administration, New York Regional Office 


PROBLEM 


In recent years, there has been an increased interest in the administration of the 
Taylor Manifest Anxiety Scale (TMAS) “) to patients in mental health clinics. The 
responses to this test have been suggested for use in assessing changes in psychiatric 
conditions resulting from various treatment techniques®: *). Since the distributions 
for outpatient groups may differ from non-psychiatric college groups or other non- 
psychiatric samples as reported by Taylor and others“: *), it would seem worthwhile 
to explore the responses of a large outpatient sample to this test. The present in- 
vestigation was designed to obtain a representative sample of scores in a Veterans 
Administration Mental Hygiene Clinic. It is part of a larger study on the relation- 
ship between clinicians’ use of various psychological tests and manifest anxiety. 


PROCEDURE ; 

The 128 subjects comprising the sample ranged in age from 23 to 60 years (mean 
34.5 years; S. D. 14.7). The number of years of schooling ranged from seven to 
twenty (mean 11.5; 8. D. 2.8). Seventy-two percent were married, 23% single, 
and 5% separated or divorced. Thirty-four percent of the sample was diagnosed 
by the screening psychiatrists as schizophrenic and 66% as neurotic or character 
disorder. No patients with organic pathology were included. 

The Taylor Manifest Anxiety Scale was administered routinely to all patients 
applying for treatment during a three month period with two exceptions, women and 
veterans with reading or language handicaps. The distribution of scores and per- 
centiles may be seen in Table 1. 


TaBLe 1. DistRIBUTION OF FREQUENCY AND PERCENTILES ON TMAS 




















Score No. of Cases Percentile Score No. of Cases Percentile 
5 1 .007 27 5 .430 
6 1 .015 28 5 .469 
8 2 .031 29 3 .492 
9 2 .047 30 8 554 

10 1 .055 31 4 .586 
1l 4 .086 32 7 .640 
12 4 .117 33 7 .695 
13 1 .125 34 6 .742 
14 4 .156 35 5 .781 
15 3 .180 36 3 .804 
16 2 .195 37 4 .836 
17 2 211 38 1 .843 
19 3 . 234 39 4 .875 
20 2 . 250 40 6 .922 
21 3 .273 42 3 .945 
22 6 .320 43 3 .968 
23 5 .359 44 1 .976 
24 2 .375 45 2 .992 
25 2 .390 46 1 1.000 
RESULTS 


The mean TMAS score for this sample is 27.5, S. D. 10.1. This mean does not 
differ significantly by ¢ test from the mean reported by Matarazzo for a sample of 44 
psychiatric patients in an outpatient clinic. However, it is significantly higher than 
means reported for non-psychiatric patients. Table 2 compares mean of the present 
sample to means reported for other psychiatric and non-psychiatric samples. 
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Tasip 2. Taytor Manirest Anxrery ScaLteE Mans AND STANDARD DEVIATIONS FOR PsYcHIATRIC 
AND Non-PSsyYCHIATRIC 








NYRO Hammock 


| 
Matarazzo 
Psychiatric Psychiatric Non-Psychiatric Non-Psychiatric 
| 128 44 29 | 138 
| 





27.5 26.2 13.31 13.4 
10.1 8.2 10.38 7.8 
l 





A chi-square test was made between occupational level and whether the sub- 
jects scored above or below the group mean on the TMAS. The relationship was not 
significant. A ¢ test was computed between the mean TMAS scores for the psychotic 
and psychoneurotic groups. Although the mean for the psychotic group was higher 
than the mean for the neurotic group, the difference was not significant. 


SUMMARY 
The Taylor Manifest Anxiety Scale was administered routinely to 128 patients 
applying for psychiatric treatment in an outpatient clinic. The results revealed that 
this sample scored significantly higher than non-psychiatric subjects. No significant 
relationship was discovered between scores on this test and such factors as occupa- 
tional level and diagnosis. 
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COUNSELOR RATINGS OF PROCESS AND OUTCOME IN 
CLIENT-CENTERED THERAPY! 


EUGENE T. GENDLIN, RICHARD H. JENNEY AND JOHN M. SHLIEN 
University of Chicago 
INTRODUCTION 


The therapist’s rating of the outcome of therapy is one of the most widely used 
of outcome criteria in psychotherapy research. In many studies it is the sole cri- 
terion. Such ratings have been shown, in cross validation studies, to have significant 
correlations with self-reporting measures such as Q-sort adjustment ratings“, pro- 
jective analysis of Rorschach protocols“, and diagnostic scores derived from the 
MMPI®). Many factors may influence outcome ratings. The present study ex- 
amines association between outcome ratings and counselors’ observations of the 
process of therapy. 

Theories (Freud, Rank, Sullivan, Rogers) and some research) emphasize the 
important role of the inter-personal relationship between client and counselor in the 
process of psychotherapy. On the other hand, recent studies“: *) have found outcome 


1Preliminary results of this project were presented at the American Psychological Association Con- 
wale 1956, and in Counseling Center Discussion Paper, Vol. ITI, No. 15. The authors are indebted 
the Wieboldt Foundation for generous support of this research in its early phase. Further analysis 
is Sau possible by the grant of the Ford Foundation (Psychotherapy Research Program), to ‘the 
Counseling Center, University of Chicago. 
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ratings unrelated to, or negatively correlated with the extent to which clients focus 
on the relationship during therapy.’ In these studies ‘‘relationship focus” is meas 

as the amount of client discussion about the therapist or about the relationship. The 
present study attempts to distinguish between observations of relationship focus as a 
topic of discussion and observations indicating use of the relationship for significant 
experiencing. Counselors rate the frequency of specifically defined observations, 
rather than giving only an overall estimate of ‘‘relationship focus’. 


THE RatIneG ScaLes 
Association is examined between counselor ratings of outcome and their ratings 
on a six item scale shown in Fig. 1. For purposes of comparison, item 1 reproduces 
Seeman’s scale of relationship focus. Items 2-6 are more specifically defined ob- 
servations. 
Fie. 1. Srx Irem Ratine Scate 





1. Does therapy, for this client, focus chiefly on his problem, or does it focus chiefly on his relation- 
ship with you? This scale separates relationship from problems, regardless of the qualities of either.) 


1 2 3 4 5 6 7 8 9 
Focus on his problems Focus on relationship with you 





2. To what extent does the client talk about your general characteristics such as age, sex, looks, 
beliefs, background, school of therapy, et cetera? 


“You’re young so I doubt if you’ll understand me.” 
“You’re cen divestive so of course you won’t answer me.” 


1 2 3 4 5 6 9 
Often Rarely 


3. To what extent does the client find that his relationship with you is an important instance of the 


difficulties he has generally? 
“T feel guilty when I want to be dependent. 
And I feel that way with you also.” 
“T’m uncomfortable about your opinion of me. 
Come to think of it, I’m always worried about what 
others think of me.’ 


1 5 6 7 8 9 

Not at all Very significantly 

4. How important to the client is the relationship as a source of new experience? Example: 
“T’ve never been able to let go and just feel de- 
ndent and helpless, as I do now.” 


© is the first time I’ve ever really gotten angry 
at someone.” 


1 2 3 4 5 6 7 8 9 


5. To what extent do the problems focus in the past? (Childhood or earlier years.) 
4 5 6 











1 2 3 7 8 9 

Talk about feeling past or present Express feelings of the moment. 

6. To what extent does the client express his feelings, and to what extent does he rather talk about 
them? (This scale differentiates direct expression from report about one’s feelings, regardless of whether 
the feeling is past or present). Example: 

“T have this feeling of hate and it’s for you.” “T hate you.” 

“T was scared last night.’ “Tt comes to me now how scared I 
“Often I feel depressed.” (No indication of really was last night.’’ “Gee, I feel 
present feeling in either words or voice.) low.” 


1 2 3 4 5 6 7 8 9 








Since eo “) conclusions are based on a direct examination of client statements whereas the 
conclusions of this study are based on data from rating scales made by counselors, the two studies are 
not directly comparable. The method vey ed in this study is that used by Seeman “), 

*The authors wish to thank colleague Dr. Vera John for her help in the construction of the scale. 
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Theoretically, scales 1, 2, and 5 are indicative of “relationship focus’ as a topic of 
discussion, while scales 3, 4 and 6 define ‘‘relationship focus’’ as a use of the relation- 
ship for significant experiencing. Successful outcome ratings are hypothesized to be 
associated with observations indicative of the latter kind of relationship focus. 
Specifically, outcome ratings will correlate with scales 3, 4, and 6, and outcome 
ratings will not correlate with scales 1, 2, and 5. 


POPULATION AND PROCEDURE 


This scale was rated by 16 counselors of 39 clients who were taking part in a 
research project at the Counseling Center on time limited therapy”. Clients were 
not selected except in the sense that they were willing to participate in the research 
project. Eighty per cent of those asked participated. Counselors ranged in exper- 
ience from 25 years to a minimum of one and a half years and all were staff counselors 
at the Counseling Center. The counselors were asked to use the rating scale after the 
seventh interview and after the last interview. Their ratings were based solely on the 
experience of the counseling hours, without any knowledge on the counselor’s part of 
diagnostic test results on other evaluative measures of these clients. 


REsULTS 
Since our concern is with the relation between characteristics of the therapy 
process as perceived by the counselor and the judged success of the case, the findings 
are presented as correlations between the six items of the scale (a) early in therapy, 
(b) at the end of therapy, and (c) the change in these items, with the outcome rating. 


TaB.zE 1. CorRELATION OF COUNSELOR OUTCOME RATING WITH 7TH INTERVIEW, 
Enp AND CHANGE RatINnGs ON Process ScaLEs 





Items 7th Interview End Change 





. 256 185 .002 
306 .220 


Instance . 249 .336* 
rarely to often 
New Experience ‘ .400* 
none to much 
.163 





*Significant at the .05 level 
**Significant at the .01 level 


Table 1 presents these correlations and levels of significance. From Table 1 we can 
make these statements regarding the influence of characteristics of the process on the 
counselors’ judgements of outcome, bearing in mind we are discussing only those 
areas of the process covered by this rating scale: 

1. The quality of the therapy process in the first seven interviews is not a 
significant determinant of the counselor’s judgement of success, since none of the 
ratings made after the seventh interview correlate significantly with the out- 
come rating. 


2. At the end of therapy three items definitely are not related to the out- 
come rating. Whether or not the client is observed to focus on the problem or 
the relationship (scale 1), on the past or present (scale 5), or discusses the 
characteristics of the counselor (scale 2)—does not influence the counselor’s final 
judgment of the outcome. 

3. At the end of therapy, three items are related to the counselor’s judg- 
ment of success. The successful client is seen as (a) finding his relationship with 
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the counselor as an instance of his general interpersonal difficulties (scale 3); 
(b) deriving from the relationship itself new and significant experience (scale 4); 
and (c) expressing his feelings directly rather than reporting them (scale 6). 


4. On only one process item is change (a difference between the seventh 
and final interview ratings) a significant influence on the counselor’s outcome 
rating. A successful client is likely to be one who moves from reporting his feel- 
ings to expressing them directly. Change on the other items does not bear a 
significant relationship to the case outcome.* 


SUMMARY 


This investigation examined association between counselor observations of the 
therapy process and counselor judgements of success. A six item rating scale was 
used by the counselors of 39 clients after the seventh interview and at termination. 
The purpose of the study was to define more specifically what is meant by client’s 
“focus on the relationship’ during therapy. Theoretically the relationship as a 
frequent topic of discussion is differentiated from client’s use of the relationship for 
significant experiencing. The general hypothesis was that “relationship focus’ can 
be defined in terms of two kinds of observable aspects of therapy, one of which will 
correlate significantly with outcome ratings, while the other will not. Specifically, 
the hypotheses were confirmed that outcome ratings correlated significantly with the 
extent to which the counselor observed the client (a) verbalizing that his present 
experience of the relationship is an instance of his more general problems; or (b) is 
an entirely new experience for him; and (c) communicating to the counselor in a 
spontaneous and direct manner of expression. On the other hand, no correlation was 
found between outcome ratings and the extent to which counselors observed the 
client (a) discussing the relationship; (b) discussing the counselor; or (c) discussing 
present events. . 
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‘In the Seeman study clients who were rated successful were found to have changed significantly 
on nearly all items. Both pre and post ratings were made only at the end of therapy. It is possible that 
the size of change scores was exaggerated by having the counselors make their before and after ratings 
at the same time in contrast to the present study in which the ratings were made at different times. 





ATTITUDES TOWARD THE MENTAL HOSPITAL AND SELECTED 
POPULATION CHARACTERISTICS! 


STEVE PRATT, DUILIO GIANNITRAPANI AND PRABHA KHANNA 
Larned (Kansas) State Hospital 


PROBLEM 


Historically, mental hospitals have been, in the main, custodial institutions 
which, if not geographically, are topologically isolated from their town communities. 
Current literature“: * *) however, reflects growing awareness of the significance of 
relationships existing between the hospital (itself considered as a community) and 
the surrounding town-community. Attitudes of the town population toward the 
mental hospital both reflect and constitute important aspects of hospital-town inter- 
action. If these relationships are considered relevant to the reintegration of hospital 
with town-community and to the transformation of mental hospitals from custodial 
to treatment centers, then assessment of such attitudes is indicated. 

Investigation should be directed not only toward general attitudes expressed 
through characteristic colloquialisms, e.g. the “nut house,” “‘booby hatch,” “dump- 
ing ground,” etc., but toward attitudes in terms of specific areas of perception of the 
mental hospital, ‘its staff and treatment program and of specific ways in which the 
townspeople perceive themselves in relation to the hospital-community per se. The 
present study explores some of these specific areas of attitude as they are associated 
with selected population characteristics. 


METHOD 


A questionnaire was devised for assessing the expressed attitudes of persons of 
the town-community toward the hospital-community, 7.e. toward a large (1500 bed, 
700 staff) state psychiatric hospital, its staff, patients, treatment program, etc. Nine 
staff psychologists, as a group, wrote approximately 200 items considered on a basis 
of group judgment as “best’”’ suited for assessment of relevant areas of attitude 
toward mental hospitals. These 200 items then were ordered within descriptive con- 
tent categories (areas) and the final items selected by group concensus. The final 
questionnaire’ consisted of 57 items of the ‘‘Yes-No-Don’t Know” and multiple- 
choice type ordered within four main areas of attitude: clinical, social, economic, 
political, and toward the so-called criminally-insane patients (i.e. those mental 
patients having criminal charges against them). These attitude areas may be briefly 
characterized as follows: 

Clinical. This area assessed attitudes beg the hospital perceived as a treatment facility; 


expectations and acceptance-rejection of staff clinical services; attitudes toward hospitalized 
patients and interest in mental health topics or activities. 

Social. Attitudes toward the staff, i.e., perceived staff social valence in terms of expressed 
sociometric preference hierarchy (status) ; staff motivation and emotional stability; excluding- 
including attitudes i.e. townspeople assigning staff or themselves to in vs. out-group status 
on sal in relation to town and hospital; desirability of having foreign and non-white staff mem- 

rs. 


Economic. Perception of the economic potential of the es oy for the town-community; 


and of the staff in terms of consumer-potential; financial reliabili 
toward economic and labor aspects of patients. 

Political. Excluding-including, i.e., assigning hospital staff to in vs. out-group status in terms 
of town, civic and political activities; townspeople’ 8 evaluation of staff’s interest in town “good 
citizenship” ; perceived need to increase town’s control over hospital and staff. 

Criminally-insane. The differential perception of this subgroup of the hospital-community’s 
patient population, particularly attitudes reflecting fears and stereotypes. 


ty of staff members; attitu 


‘This paper is part of a continuing group peapendh project: Exploration of selected aspects of the 
(mental hospitals) staff-patient-town a a em 
*Copies of the questionnaire are available from the authors upon request. 
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Sample. Four hundred adult residents of a small typical midwestern community 
(including the senior high school class), representing a cross-section (nonproportional) 
of the total adult population of approximately 3,000 (mental hospital employees not 
included). Sample is more than 13% of population represented. Members of different 
age, vocational and financial groups were included. The testing was done with the 
cooperation of a wide variety of organizations such as civic clubs, farm and church 
groups, students, teachers, housewives. 


Scoring. The reply to each question was scored according to its representation of an 
expressed positive, negative or indifferent value; direction having been validated in 
terms of consonance in judgment of nine clinical psychologists. These numerical 
scores were added separately for each of the five content areas specified above so 
that five cumulative scores representing the relative degree of positive-negative 
attitude of each subject were obtained. The scores for each of the areas were then 
added to form a cumulative score (termed total score) representing the total of each 
subject’s attitudes for the whole questionnaire. 

A sixth area assessed attitudes expressed specifically by businessmen toward the 
hospital and staff in regard to economic and business matters. This was explored 
with additional questions constructed for this group and the responses of this sub- 
sample were analyzed separately. These scores were identified with the term bus- 
inessmen. 

As independent variables, selected dimensions identifying characteristics of the 
population were included in the questionnaire so that positive-negative attitudes 
could be compared against these dimensions. Each of these dimensions was divided 
— groups having as nearly as possible an equal number of cases within each group. 

ese were: 


Age. 15-19, 20-29, 30-49, 50 and over. 
Length of Residence in Larned. Less than 24% years, 244-10 years, 11-29 years, more than 30 
years. 


Family aoe than $3,000, $3,000-$5,000, $5,000-$8,000, $8,000-$10,000, more than 
Sex. Male-Female. 
Marital Status. Married-Single. 
Ownership of Rental Property. Yes-No. 
Larned as Birthplace. Yes-No. 
Occupation. (1) Professional {6 Semiprofessional 
2) Farmer 7) Clerical 
3) Merchant, Dealer 8) Teacher 
4) Financial Officer, Manager Student 
5) Skilled Worker (10) Housewife 
Previous Employment at the State Hospital. Yes-No. 
Relatives Employed at the State Hospital. Yes-No. 
Business Ownership. Yes-No. 
Frequency of Visits to the State Hospital. 1-10, 11-70, more than 70. 


Since no significant relationships were found between the last four dimensions and 
the attitude scores, no conclusion can be drawn at this time regarding these particular 
dimensions and they are not discussed in the results. 

All variables considered were chosen in terms of their possible significance in 
relation to the favorable-unfavorable attitude scores obtained. While, for example, 
the relationship of attitudes with age, per se, is of theoretical interest, hospitals are 
faced with the practical problem of determining on the one hand which age groups 
are most negatively oriented and therefore in need of special educative effort, and on 
the other hand which age groups provide the best resource in terms of being more 
positively oriented. In short, toward which group should what t of effort be 
directed in order to further more effective integration of the town and hospital com- 
munities? Analogous considerations apply to the other identifying population char- 
acteristics selected in this study. 
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RESULTS 


The six attitude scores (one total score and five subareas) were divided into 
quartiles and the significance of the relationships of each with the independent 
measures above were tested with chi-square. The points of significance can be seen 
in Table 1. 


TaBLE 1. Sranrricant Cui-SquarREs Between ATTITUDE SCORES AND INDEPENDENT VARIABLES 








Total Political Criminally- Clinical Economic Social Businessmen 
Score Area Insane Area Area Area Area Sample 


P < 01 

Males* 
P < .Ol P< .Ol P < .02 
Youngest* 30 yr. Youngest* 
(15-30 yr. olds** 
olds) 
;< @ P< .O1 
Single* Single* 





P < .0l 
No prop- 
erty 
P < .Ol 
Not native* 
.02 
Income $3,000* 
Length of P< .0l P< .0l 
Residence Shortest Shortest* 
in Larned Residence* (2% yrs.) 
r < .O j P< .01 P < .05 
Profes- Skilled Housewives, 
Occupation sional* trades* Teachers* 
Farmers i Housewives** Farmers, 
& Cler. Merchants** 
Wkrs.** 





*Group entered is the one expressing the most positive attitudes. 
**Group entered is the one expressing the most negative attitudes. 


Total Score. The relationship between age and total score is significant beyond the 
.01 level of confidence. Inspecting the scores*, there is an inverse relationship be- 
tween age and positive attitude, i.e. with an increase in age there is a decrease in the 
number of people with positive attitudes toward the hospital. The significance be- 
tween total score and both marital status and length of residence may be partly a 
function of age. Married people and people who have a longer residence in the town- 
community are the ones who also have a higher percentage of negative attitudes. 
This inference has face validity and was not investigated further. Another significant 
relationship was found to obtain between total score and occupation. Here it was 
found that the most positive groups consisted of professional men, students, teach- 
ers, and a group composed of county officers and managers. In the middle of the 
range of decreasing positive attitudes were the merchants, semiprofessional people 
(nurses, laboratory technicians, etc.) and housewives. The most negative attitudes 
were expressed by individuals in the skilled trades, farmers and clerical workers. 


Political. The points of significance in this area do not differ basically from those of 
total score of which this is a part. The significance of unmarried people expressing 
more positive attitudes is clearly maintained, but in the relationship with occupation 
the students now represent the group expressing the most positive attitudes. 


Criminally-insane. These scores are the only ones in which sex is significant—the 
men have more positive attitudes than women toward criminally-insane patients. 
The significant relationships in the other dimensions support the findings for total 


*Copies of the distributions of scores are available upon request from the authors. 
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score with the following differences: People with an income of less than $3,000 per 
year were significantly less positive in their attitudes than all other (higher) income 
groups. This would corroborate for this group the findings of Hollingshead and 
Redlich®? in their study of relationships between class and attitudes toward mental 
illness. The occupation hierarchy shows that the skilled trades have the most positive 
attitudes while the housewives expressed the most negative feelings. 


Clinical. Contrary to expectations, none of the relationships between each independ- 
ent variable and the responses in the clinical area are significant. This means 
that these questions as a group do not contribute significantly to the differentiating 
capacity of total score for the present population. More theoretically we would have 
to conclude that these questions deal with an attitude area which is quite homogenous 
and stable between all segments of the population tested. 


Economic. In this area only occupation is significant and then only at the .05 levels 
Housewives and teachers present the most liberal attitudes while farmers and 
merchants are most negative in their perception of the economic potential of the 
hospital for the town-community. 


Social. The questions dealing with social attitudes toward staff members show 
significant relationship with age. The most positive attitudes are found in the 
youngest group, thus contributing to the significance of age and total score. The 
other point of significance for these questions is the discrimination between people 
who do and who do not own rental property. Those who don’t are more positively 
inclined toward the hospital. It should be pointed out that the building of a large 
modern dormitory on the hospital grounds eliminated some of the rental exploita- 
tion in town, but also created considerable resentment in town landlords who found 
themselves unable to compete. 


Businessmen. The only significant difference here is between the businessmen who 
were born in Larned as oppased to those born elsewhere. The latter have the more 


positive attitudes, suggesting that newcomers bring with them attitudes toward 
mental hospitals which are more progressive than those held by natives or old-timers 
whose attitudes may represent a time lag associated with growing up within the 
climate of negative stereotypes. 


DISCUSSION 

Though far from approaching potential goals, this mental hospital has made 
substantial progress during the past several years in terms of the transition from a 
purely custodial to a treatment oriented center. Over-all findings indicate that the 
town-community has likewise made considerable progress in updating some of its 
attitudes toward mental illness and its perception of the mental hospital. This pro- 
gress, however, has not yet begun to encompass potential integrative goals of pos- 
itive community identification with the mental hospital nor involvement in the total 
treatment program. Time lags remain and townspeople in many ways still perceive 
the institution and mental illness in terms of negative stereotypes which (though 
some attitudes may have been justifiable twenty years ago) are now outdated and 
unwarranted. 

Both the positive and negative aspects of these trends in attitude would appear 
to corroborate the general findings of other studies in analogous situations ®: 4 5 8. 9), 
The most positive attitudes were expressed by the high school students (the entire 
senior class of the town’s high school was tested). The possible explanation may lie 
in the relatively progressive climate toward mental health within the educational 
system. Activities sponsored by the hospital such as the annual Mental Health 
Week program“) evidently find their most responsive audience in this age group 
with the possible exception of unique older groups such as clergymen “), 

Attitude differences between vocational groups were also clearly demonstrated. 
It was found that the vocations could be ordered into a favorable-unfavorable 
hierarchy in terms of these attitude differences. Inspection of the vocational hier- 
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archy obtained, in relation to total score, would suggest that a sizeable amount of 
the variance could be assumed to have been accounted for by the education factor. 
Such differences will be investigated further in relation to specific items.‘ 


SUMMARY 


Four hundred residents of a small typical midwestern community were ad- 
ministered a questionnaire designed to assess attitudes toward a large (1500 bed, 700 
staff) state mental hospital (considered itself as a community) located adjacent to 
the town-community. The sample constitutes approximately 13% of the total adult 
population not employed at the hospital. The following areas of attitude toward the 
hospital were investigated : clinical, social, economic, political and toward criminally- 
insane patients. Over-all (total score) favorable-unfavorable attitudes, and favor- 
able-unfavorable attitude scores for these specific areas were analyzed in terms of 
their relationships with selected population characteristics. 

Results indicated significant shifts in attitude between generations (younger 
more positive) and significant relationships between attitudes toward the mental 
hospital and other population characteristics, e.g. marital status, income, frequency 
of contact with the hospital, occupation, length of residence, etc. Some of the signi- 
ficant relationships found in addition to the important age differential were as 
follows: Scores represénting attitudes toward the criminally-insane were the only 
ones in which sex was significant, with men expressing more positive attitudes than 
women. With respect to questions dealing with the social valence of the mental 
hospital, it was found that townspeople who do not own rental property expressed 
the most positive attitudes. 

With reference to occupational groups, professional people and students ex- 
pressed the most positive attitudes while farmers expressed the most negative. There 
were two interesting exceptions: First, in relation to the criminally-insane, in which 
case housewives expressed the most negative attitudes, possibly representing anxiety 
and fear associated with this patient group. Secondly, in relation to items in the 
economic area where housewives and teachers were found to have the most favorable 
attitudes with merchants as well as farmers having the most negative. Merchants, 
to a greater degree than other groups, evidently perceive the hospital as providing 
facilities to potential customers in areas that encroach upon their business interests. 
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CAPACITY AND MOTIVATIONAL DIFFERENCES IN VERBAL RECALL’ 
FRANK PEDERSEN ; AND DAVID MARLOWE 
The Ohio State University College of Medicine, University of Kentucky 


PROBLEM 


In a recent study, Adelson and Redmond“? reported the experimental confirma- 
tion of an hypothesis linking differences in amount of verbal recall to anal retention 
and anal expulsion. They found that anal retentive females (as measured by the 
Blacky Test) performed significantly better than anal expulsive females on immed- 
iate and delayed recall of “disturbing” and ‘innocuous’ passages. No significant 
difference was found between retentives and expulsives on intelligence. These find- 
ings were interpreted as offering support for the psychoanalytic proposition that 
differing forms of ego organization result in variations in cognitive style which in 
turn reflect capacity differences for verbal recall. Adelson and Redmond noted, how- 
ever, that an alternative and equally plausible explanation of their findings could be 
formulated by assuming that the recall differences were based on motivational differ- 
ences between expulsives and retentives. Thus, the retentives could be viewed as 
complaint, eager to do well, and working hard at the tasks whereas the expulsives 
could be characterized as rebellious and lacking in interest. 

The present study had two purposes. First, an attempt was made to replicate 
Adelson and Redmond’s findings; second, we wished to test the possibility that differ- 
ences between anal retentives and expulsives in verbal recall reflect motivational 
differences between these groups. The motivational construct employed was need 
achievement (n-Achievement). Previous research“) has demonstrated that high 
need achievers work harder and more persistently at a task than do low need achiev- 
ers when performance is viewed as instrumental to the satisfaction of the desire for 


success or accomplishment. It seemed reasonable to expect, therefore, that strength 
of achievement motivation would not only be related to differences in verbal recall, 
but would also be related to the retentive-expulsive variable. 


METHOD 


Subjects. The sample consisted of 70 college male students at The Ohio State Uni- 
versity. The use of male Ss rather than females as in Adelson and Redmond’s study 
seems appropriate since psychoanalytic theory does not specify any sex differences 
in regard to the behavioral consequences of anal fixation. 


The Anality Measure. Scores obtained on the Blacky Test served as the basis for 
assigning Ss to either the Expulsive, Retentive or Neutral groups. An S was desig- 
nated Anal Retentive if his combined score on the free association and inquiry was 
‘“‘very strong” or “fairly strong” on Anal Retention. Similarly, an S was designated 
Anal Expulsive if his combined score was “‘very strong” or ‘‘fairly strong” on Anal 
Expulsion. Ss who gave neither predominantly Expulsive or Retentive responses 
were assigned to the Neutral category. The groups were composed of 19 expulsives, 
23 retentives, and 28 neutrals. One expulsive and two neutral Ss failed to return for 
the second part of the experiment. 


The ‘Disturbing’ and Innocuous’’ Passages. The “‘innocuous’’ passage used was 
identical to that employed by Adelson and Redmond. This passage was scored for 
recall by giving one point for each word correctly reproduced (except for conjunc- 
tions and prepositions) and two points for accurate y sent and numbers. The “‘dis- 
turbing”’ passage was scored for thought units. In the “disturbing’”’ passage minor 
changes were made in the wording of two paragraphs to make them more suitable 


1The writers wish to acknowledge the ions of Alvin Scodel while writing this paper, and the 
assistance of Harve Rawson in the collection of the data. 
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for male Ss, i.e., appropriate sex changes were made in the paragraph dealing with the 
oedipal stage, and castration anxiety was described in place of the paragraph on 
penis envy. 


The n-Achievement Measure. Each S wrote six four-minute imaginative (TAT-type) 
stories under group conditions according to the standard procedure of McClelland ©). 
Cards 3, 8, 5, 7, 28, and 4 of the standard series were administered in that order. The 
n-Achievement scores were obtained by scoring the stories for Achievement Imagery 
and Achievement Thema according to the definitions of McClelland“. The possible 
range of scores was 0 to 12. A mean of 4.97 and a standard deviation of 2.14 were 
obtained. A measure of the inter-rater reliability for the scoring of n-Achievement 
was obtained by correlating the scores obtained by the writer with those obtained 
independently by another scorer. The resulting correlation, based on the stories of 
20 Ss was .86. 


Procedure. The Ss appeared for the experiment in groups of 2 to 7. The Blacky 
Test was administered first. The Ss were then handed the two passages which were 
printed on separate sheets of paper and were told, “‘Read these papers carefully. You 
will be asked questions on them later’. Exactly 10 minutes were allowed for the 
reading of the passages. The Ss were then told “Now write down all you can re- 
member about what you have just read.’”’ One week later the Ss returned and were 
asked to ‘‘Write down all you can remember about the two passages you read last 
week”. Immediately after all the Ss had completed the — recall task, the 
measure of n-Achievement was administered. 


RESULTS AND DIscuUssION 


The results reported in Table 1 indicate only one significant difference between 
retentives and expulsives in amount of verbal recall, that for delayed recall of the 
disturbing passage where the expulsives obtained a significantly higher mean recall 
score. The mean recall score for expulsives was also higher than that for retentives 
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on immediate recall of the disturbing passage, though not significantly so. The re- 
tentives obtained higher mean scores for immediate and delayed recall of the in- 
nocuous passage, although for both of these comparisons the differences fail to reach 
significance at the .05 level (two-tailed test). Thus, our results constitute a failure to 
replicate the major findings of Adelson and Redmond. For two of the comparisons, 
our results are the reverse of those obtained by Adelson and Redmond. These re- 
versals cast considerable doubt on Adelson and Redmond’s conclusion that re- 
tentives and expulsives differ in capacity for verbal recall with the retentives possess- 
ing superior ability. 
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With the exception of minor changes in the wording of the disturbing passage, 
and the use of male Ss in the present study, our research duplicates almost exactly 
the methodology described in the original report. Psychoanalytic theory, however, 
would not lead us to expect a sex difference in the behavioral outcomes of anal fixa- 
tion. There can be little doubt, though, that the Bennington students who served as 
Ss in the original study constituted a group distinctly different from Ohio State 
University (OSU) students on such variables as attitudes, values, interests, etc. We 
believe that the recall differences between the expulsives and retentives in our sample 
and between our sample and the Bennington Ss may be attributable to differences 
in the “threat value” of the disturbing passage for these groups. We would hy- 
pothesize that the disturbing passage was less threatening and probably more inter- 
esting to the Bennington Ss as compared to the OSU Ss. Since no restriction was 
placed on how much of the 10 minutes the Ss could devote to each passage, it is easy 
to imagine the more ‘sophisticated’? Bennington Ss devoting less time to the 
relatively uninteresting and rather bland innocuous passage. Both our expulsive and 
retentive Ss obtained recall scores significantly higher than those of the Bennington 
Ss on the innocuous passage. Such a comparison is appropriate since the scoring of 
the innocuous passage in both studies was based on an objective counting of words, 
dates and numbers accurately reproduced. In addition, many of our expulsive and 
retentive Ss in recalling the disturbing passage substituted non-threatening words in 
place of such words as “‘penis”’, “‘suck’’, ‘‘bite’’, etc. 

It is also possible that within our sample, the disturbing passage was more 

threatening to the retentives than to the expulsives. The traits of orderliness and 
pedantry that are ascribed to anal retentives by Freudian theory should result in the 
retentives devoting more time to material which is ‘‘academic’’, factual and anxiety- 
free in nature. Our findings provide some support for this formulation, i.e., the re- 
tentives in the OSU sample tended to recall more than the expulsives on immediate 
and delayed recall of the innocuous passage. Since the Ss, however, were free to ap- 
portion the ten minutes reading time in any way they chose, we have no way of 
determining exactly how much time the Ss devoted to each passage. Further research 
might control for this factor by requiring all the Ss to spend an equal amount of time 
on each passage. 
Our belief that n-Achievement might be related to the retention-expulsion var- 
iable was not confirmed. Table 1 indicates that anal retentives do not differ sig- 
nificantly from anal expulsives in the strength of their achievement motivation. Need 
achievement, however, is correlated very slightly with verbal recall scores, although 
for each of the four comparisons the correlation fails to reach significance (Table 2). 
The magnitude of these correlations confirms previous findings that, at best, only 
low to moderate relationships can be expected between a fantasy measure of n- 
Achievement and some complex overt behavior when the influence of the situation 
and the S’s subjective expectancies are not taken into account. ®: * 4) 

We had available for 51 of our 70 Ss Ohio State Psychological Examination 
(OSPE) scores, a measure of verbal intelligence. Here, our finding confirms that of 
Adelson and Redmond: no significant difference was obtained between anal retentives 
and anal expulsives on intelligence. Parenthetically, it should be noted that in this 
study, n-Achievement is significantly related to intelligence, r = .35 (Table 2). In 
another study, Liverant (personal communication) collected n-Achievement and 
OSPE scores for 122 OSU males who are directly comparable to the Ss in the present 
study. Liverant obtained a — .01 correlation between n-Achievement and OSPE 
scores. This significant difference in correlations might be accounted for by the fact 
that our n-Achievement scores were obtained under conditions of motive arousal, 
i.e., the achievement measure was administered immediately after the ‘‘test’’ of 
delayed recall. In contrast, Liverant’s n-Achievement scores were obtained under 
standard neutral group conditions. Conceivably, n-Achievement is significantly cor- 
related with intelligence only when the n-Achievement measure is administered in a 
“‘test-like” setting somewhat comparable to an “‘academic-test”’ situation. 
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TaBLE 2. INTERCORRELATIONS AMONG NEED ACHIEVEMENT, INTELLIGENCE AND VERBAL RECALL 
(PEARSON 1) 
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Overall, a good part of the variance in recall scores can be attributed to intelli- 
gence. In Table 2, we find that OSPE scores are significantly correlated with verbal 
recall in three of the four comparisons, while in the remaining comparison the cor- 
relation just misses significance. It is hardly unexpected that intelligence is related 
to verbal recall under the present experimental conditions. Nevertheless, one cannot 
differentiate retentives from expulsives on the basis of verbal intelligence. The sug- 
gestive nature of our findings coupled with those of Adelson and Redmond appear to 
warrant further analysis of the anality-recall relationship with more rigorous con- 
trols. 


SUMMARY 


The present study attempted to replicate a previously reported finding of Adel- 
son and Redmond that ana! retentive individuals have a greater capacity for verbal 
recall than anal expulsive persons. Evidence was also sought concerning possible 
relationships between anality, verbal recall and n-Achievement. 

Our findings indicate that expulsives tend to recall more disturbing material 
than the retentives, whereas the retentives tend to recall more innocuous material. 
These findings constitute a failure to replicate the major results of Adelson and Red- 
mond. The recall differences obtained between retentives and expulsives in the 
present study were interpreted as due to the possible influence of such factors as the 
extent to which the passages differ in interest and threat value, as well as the possible 
effeet of variations in the amount of time devoted to each passage. 

Retentives and expulsives did not differ in the strength of their achievement 
motivation, nor was strength of achievement motivation related to differences in 
verbal recall. A moderately high correlation obtained between n-Achievement and 
intelligence was interpreted as due to the fact that both measures were administered 
in “‘test-like’’ settings. 
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EDITORIAL OPINION 





GETTING THE “CLINICAL” BACK INTO PSYCHOLOGY 


It is paradoxical that while test constructors generally are required to validate 
their measures against some ultimate criterion of performance, few psychologists 
accept the challenge of validating their judgments against the prognostic outcomes 
of their own judgments. Many psychologists function professionally for years with 
no validation of their theories and methods and, indeed, they are safe as long as they 
do not scrutinize their own results too closely. Clinical practice offers the psychol- 
ogist one of the few opportunities for validating his methods in terms of whether out- 
comes actually verify predictions. It is indeed a sobering experience for any kind of 
a psychologist to discover that his judgments do not exceed chance expectancy. 

It is pertinent to consider how various kinds of psychologists would be labelled 
if their activities were described in strict operational terms. Experimentalists who 
rarely deal with the whole person might be called research design specialists. Factor 
analysts (and other mathematical experts) who study behavior on the basis of 
quantitative scores could well be called behavior statistics specialists. Academic 
theorists would be designated as behavior philosophers, comparative psychologists as 
behavior zoologists, physiological psychologists as behavior physiologists, etc. Instead 
of granting the title of psychologist to adjunctive specialists, it might be reserved for 
those whose activities are defined operationally by their concern with the whole- 
organism-meeting-its-environment. If the title was restricted to those who opera- 
tionally perform diagnostic and therapeutic functions or research with actual persons 
at a high level of competency, few of those who now claim the title would qualify. 

In recent years, perhaps because of preoccupation with theory, methodology, 
research design or academic-administrative interests, many “‘so-called”’ clinicians are 
progressively withdrawing from clinical work. Many factors may contribute to this 
trend. As academic or administrative responsibilities accumulate, the older psy- 
chologist may gradually drop clinical work. There seems to be a tendency to leave 
the “chores” to beginners in the field. This trend seems to have been accelerated by 
recent research questioning the validity of the clinical judgments of many estab- 
lished specialists who are only too glad to escape from situations questioning com- 
petency. In any case, this distressing tendency to withdraw from actual clinical 
work as one gets older should be reversed since every clinician needs to keep himself 
keen and to continually polish his tools, and patients are bound to benefit from the 
active maturity of the clinician who has continued to pile up a wealth of human 
experience as he grows older. 

Another way to get the ‘‘clinical’”’ back into psychology is to devote more time 
in academic curricula and training programs to actual contacts with case materials. 
Whereas 40 years ago medicine was primarily taught by academic study and lecture 
methods, today the medical student is introduced to clinical materials beginning in 
the first year and by the fourth year everything is clinical preceptor-taught. As of 
1960, training in psychology is still academic-theoretically oriented with the student 
being required to verbally master a confusing welter of theoretical schools. Indeed 
academic theory is so overweighted and involves such difficult semantic problems, 
that the student is required to be a semantic specialist to the exclusion of attaining 
clinical experience. Perhaps the two groups of skills are mutually exclusive, only 
rarely occurring in the same person. 

There is great need for the clinician, in training and in professional work, to 
constantly develop clinical acuity using his own psychological abilities to analyse 
behavior directly. Too many so-called “‘clinicians’”’ are merely technicians, adept in 
the use of some test or method but rarely employing their entire armamentarium in 
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